Data 605 Final
CUNY SPS SPRING 2023
Required Libraries
library(data.table)
library(MASS)
library(Matrix)
library(matrixcalc)
library(dplyr)
library(ggplot2)
library(tidyverse)
library(purrr)
library(corrplot)
library(correlation)
library(knitr)
library(Hmisc)
library(forecast)
library(ggplot2)
library(ggthemes)
library(moments)
library(psych)
library(mctest)Problem 1
Setting up the required Parameters :
# Since we will use the same set of parameters for the 3 PDFs.
#set the seed - using this allows reproducibility of the sequence of random numbers
set.seed(68)
#We are required to choose a value of n > 3
n<- round(runif(1, 4, 100))
#We are required to choose a value of lambda between 2 and 10
lambda <- round(runif(1, 2, 10))
# The number of observations required is given by N :
N <- 10000
cat("We will use random generated values for 'n' and 'lambda' using the 'runif' function ","\n")## We will use random generated values for 'n' and 'lambda' using the 'runif' function
cat("The random value of n based on the requirement is : ","\n", (n))## The random value of n based on the requirement is :
## 93
cat("The random value of lambda based on the requirement is : ", "\n", (lambda))## The random value of lambda based on the requirement is :
## 7
cat("The required observations are : ", "\n", (N))## The required observations are :
## 10000
Probability Density 1: X~Gamma
Using R, generate a random variable \(X\) that has 10,000 random Gamma Ɣ PDF values. A Gamma Ɣ PDF is completely describe by “n” (a size parameter) and lambda, λ (a shape parameter). Choose any “n” greater than (>) 3 and an expected value (λ) between 2 and 10 (you choose)
#We will use the following function in R: rgamma(n, shape, rate = 1, scale = 1/rate)
cat("For n =",(n), ", lambda = ",(lambda),"and ",(N),"observations : ")## For n = 93 , lambda = 7 and 10000 observations :
xgamma <- rgamma(N, shape = n, rate = lambda)
cat("The first 10 values of the Gamma PDF are:", "\n", (head(xgamma,10)))## The first 10 values of the Gamma PDF are:
## 12.87694 14.14674 12.05287 13.22567 14.84226 13.78659 14.46058 14.3831 13.09145 13.75537
Probability Density 2: Y~Sum of Exponentials
Generate 10,000 observations from the sum of \(n\) exponential PDF with rate/shape parameter (\(\lambda\)). The \(n\) and \(\lambda\) must be the same as in the previous case. (e.g., \(mysum\) \(=\) \(rexp\)(10000,\(\lambda\))+\(rexp\)(10000,\(\lambda\)))
# we will use the following function sum(rexp(n, lambda)), i.e. the sum of the rexp function
cat("For n =",(n), ", lambda = ",(lambda),"and ",(N),"observations : ")## For n = 93 , lambda = 7 and 10000 observations :
sumexp <- numeric(N)
for (i in 1:N) {
sumexp[i] <- sum(rexp(n, lambda))
}
cat("The first 10 values of the Sum of Exponentials PDF are : ", "\n", (head(sumexp,10)))## The first 10 values of the Sum of Exponentials PDF are :
## 14.79785 12.77006 13.0302 13.45413 12.98259 12.32394 11.55567 13.12414 12.30901 13.78982
Probability Density 3: Z~ Exponential
Generate 10,000 observations from a single exponential pdf with rate/shape parameter (\(\lambda\))
expobs <- rexp(n = N, rate = lambda)
cat("For n =",(n), ", lambda = ",(lambda),"and ",(N),"observations : ", "\n")## For n = 93 , lambda = 7 and 10000 observations :
cat("The first 10 values of the Exponential PDF are : ", "\n", (head(expobs,10)))## The first 10 values of the Exponential PDF are :
## 0.05663102 0.2523144 0.1856502 0.1155885 0.7036552 0.1991529 0.003784598 0.2653726 0.1304988 0.08322686
Problem 1a
Calculate the empirical expected value (means) and variances of all three pdfs
Note : The sample mean and variance are estimates of the population mean and variance, respectively, based on the required sample of 10,000 observations.
# We will use the "mean" and "var" functions in R for this computation
cat("For n =",(n), "and lambda = ",(lambda),"and ",(N),"observations : ", "\n")## For n = 93 and lambda = 7 and 10000 observations :
cat("The Empirical expected value (mean) of the Gamma PDF is:", "\n", (mean(xgamma)))## The Empirical expected value (mean) of the Gamma PDF is:
## 13.28273
cat("The Empirical variance of the Gamma PDF is:", "\n", (var(xgamma)))## The Empirical variance of the Gamma PDF is:
## 1.913769
cat("\n")cat("\n")cat("The Empirical expected value (mean) of the Sum of Exponentials PDF is:", "\n", (mean(sumexp)))## The Empirical expected value (mean) of the Sum of Exponentials PDF is:
## 13.28817
cat("The Empirical variance of the Sum of Exponentials PDF is:", "\n", (var(sumexp)))## The Empirical variance of the Sum of Exponentials PDF is:
## 1.892334
cat("\n")cat("\n")cat("The Empirical expected value (mean) of the Exponential PDF is:", "\n", (mean(expobs)))## The Empirical expected value (mean) of the Exponential PDF is:
## 0.1409875
cat("The Empirical variance of the Sum of the Exponential PDF is:", "\n", (var(expobs)))## The Empirical variance of the Sum of the Exponential PDF is:
## 0.02005992
cat("\n")Problem 1b
Using calculus, calculate the expected value and variance of the Gamma pdf (X). Using the moment generating function for exponentials, calculate the expected value of the single exponential (Z) and the sum of exponentials (Y)
Probability Density 1: X~Gamma
The Gamma Function is defined as :
\(\Gamma(\alpha) = \int_{0}^{\infty}y^{\alpha - 1}e^{-y} dy\)\(for\) \(\alpha \gt 0\)
The Expected Value of the Gamma Function is given as :
\(E(X) = \int_{0}^{\infty} f(x) dx\)
\(\label{eq:gam-mean-s3} \begin{split} \mathrm{E}(X) &= \frac{a}{b} \int_{0}^{\infty} \mathrm{Gam}(x; a+1, b) \, \mathrm{d}x \\&= \frac{a}{b} \; . \end{split}\)
For our Computation, \(a\) = \(n\) = 93, and \(b\) = \(\lambda\) = 7 as was computed above–
\(\implies\) \(E(\Gamma)\) = \(\frac{93}{7}\) = 13.28
Similarly the Variance is given as \(\frac{n}{\lambda^{2}}\) = \(\frac{93}{7^{2}}\) = 1.89
Probability Density 3: Z~ Exponential - Epected Value using MGF
the following proof is from TSingh - Assignment 9
The MGF of the Exponential Distribution is given by ;
\({ g }_{ X }(t)=E({ e }^{ t }X)=\int _{ -\infty }^{ \infty }{ { e }^{ tx }{ f }_{ X }(x)dx. }\)
\(\Rightarrow\) \(g(t)=\frac { λe^{ (t-λ)x } }{ t-λ } |_{ 0 }^{ ∞ }\)
The First Moment
\(\Rightarrow\) \(g'(t)=\frac { λ }{ (λ-t)^{ 2 } }\)
\(\Rightarrow\) \(g'(0)=\frac { λ }{ (λ-0)^{ 2 } }\)
\(\Rightarrow\) \(g'(0) = \frac { λ }{ λ^{ 2 } } =\frac { 1 }{ λ }\) — The Expected Value - First Moment
\(\implies\) For our Computation The Expected Value - First Moment =
\(\frac { 1 }{ λ }\) = \(\frac { 1 }{ 7 }\)
= 0.143
Probability Density 2: Y~Sum of Exponentials - Epected Value using MGF
————–#####################————————
1c. Probability
For pdf Z (the exponential), calculate empirically probabilities a through c. Then evaluate through calculus whether the memoryless property holds
a
For \(P(Z>\lambda | Z>\frac{\lambda}{2})\)
Emp_prob_a <- 1-(pexp((mean(expobs)),lambda/2))
Emp_prob_a## [1] 0.6105126
b
For \(P(Z>2\lambda | Z>\lambda)\)
Emp_prob_b <- 1-(pexp((mean(expobs)),2*lambda))
Emp_prob_b## [1] 0.1389244
c
For \(P(Z>3\lambda | Z>\lambda)\)
Emp_prob_c <- 1-(pexp((mean(expobs)),3*lambda))
Emp_prob_c## [1] 0.0517807
1d
Loosely investigate whether P(YZ) = P(Y) P(Z) by building a table with quartiles and evaluating the marginal and joint probabilities
Problem 2
Overview : https://www.kaggle.com/
Compete in the House Prices: Advanced Regression Techniques competition, provide r code for the following requirements :
Descriptive and Inferential Statistics. Provide univariate descriptive statistics and appropriate plots for the training data set. Provide a scatterplot matrix for at least two of the independent variables and the dependent variable. Derive a correlation matrix for any three quantitative variables in the dataset. Test the hypotheses that the correlations between each pairwise set of variables is 0 and provide an 80% confidence interval. Discuss the meaning of your analysis. Would you be worried about familywise error? Why or why not?
Importing the data :
# Import provided datasets stored on Github
p2_train <- data.frame(read.csv('https://raw.githubusercontent.com/tagensingh/sps_data605_final_p_2/main/train.csv', header = T, sep = ","))
p2_test <- data.frame(read.csv('https://raw.githubusercontent.com/tagensingh/sps_data605_final_p_2/main/test.csv', header = T, sep = ","))
p2_test$SalePrice <- 0###Descriptive Statistics
Provide univariate descriptive statistics and appropriate plots for the training data set
# Summary of dataset :
kable(head(p2_train))| Id | MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | OverallQual | OverallCond | YearBuilt | YearRemodAdd | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | MasVnrArea | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinSF1 | BsmtFinType2 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | Heating | HeatingQC | CentralAir | Electrical | X1stFlrSF | X2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath | FullBath | HalfBath | BedroomAbvGr | KitchenAbvGr | KitchenQual | TotRmsAbvGrd | Functional | Fireplaces | FireplaceQu | GarageType | GarageYrBlt | GarageFinish | GarageCars | GarageArea | GarageQual | GarageCond | PavedDrive | WoodDeckSF | OpenPorchSF | EnclosedPorch | X3SsnPorch | ScreenPorch | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | SalePrice |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 60 | RL | 65 | 8450 | Pave | NA | Reg | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | 7 | 5 | 2003 | 2003 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 196 | Gd | TA | PConc | Gd | TA | No | GLQ | 706 | Unf | 0 | 150 | 856 | GasA | Ex | Y | SBrkr | 856 | 854 | 0 | 1710 | 1 | 0 | 2 | 1 | 3 | 1 | Gd | 8 | Typ | 0 | NA | Attchd | 2003 | RFn | 2 | 548 | TA | TA | Y | 0 | 61 | 0 | 0 | 0 | 0 | NA | NA | NA | 0 | 2 | 2008 | WD | Normal | 208500 |
| 2 | 20 | RL | 80 | 9600 | Pave | NA | Reg | Lvl | AllPub | FR2 | Gtl | Veenker | Feedr | Norm | 1Fam | 1Story | 6 | 8 | 1976 | 1976 | Gable | CompShg | MetalSd | MetalSd | None | 0 | TA | TA | CBlock | Gd | TA | Gd | ALQ | 978 | Unf | 0 | 284 | 1262 | GasA | Ex | Y | SBrkr | 1262 | 0 | 0 | 1262 | 0 | 1 | 2 | 0 | 3 | 1 | TA | 6 | Typ | 1 | TA | Attchd | 1976 | RFn | 2 | 460 | TA | TA | Y | 298 | 0 | 0 | 0 | 0 | 0 | NA | NA | NA | 0 | 5 | 2007 | WD | Normal | 181500 |
| 3 | 60 | RL | 68 | 11250 | Pave | NA | IR1 | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | 7 | 5 | 2001 | 2002 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 162 | Gd | TA | PConc | Gd | TA | Mn | GLQ | 486 | Unf | 0 | 434 | 920 | GasA | Ex | Y | SBrkr | 920 | 866 | 0 | 1786 | 1 | 0 | 2 | 1 | 3 | 1 | Gd | 6 | Typ | 1 | TA | Attchd | 2001 | RFn | 2 | 608 | TA | TA | Y | 0 | 42 | 0 | 0 | 0 | 0 | NA | NA | NA | 0 | 9 | 2008 | WD | Normal | 223500 |
| 4 | 70 | RL | 60 | 9550 | Pave | NA | IR1 | Lvl | AllPub | Corner | Gtl | Crawfor | Norm | Norm | 1Fam | 2Story | 7 | 5 | 1915 | 1970 | Gable | CompShg | Wd Sdng | Wd Shng | None | 0 | TA | TA | BrkTil | TA | Gd | No | ALQ | 216 | Unf | 0 | 540 | 756 | GasA | Gd | Y | SBrkr | 961 | 756 | 0 | 1717 | 1 | 0 | 1 | 0 | 3 | 1 | Gd | 7 | Typ | 1 | Gd | Detchd | 1998 | Unf | 3 | 642 | TA | TA | Y | 0 | 35 | 272 | 0 | 0 | 0 | NA | NA | NA | 0 | 2 | 2006 | WD | Abnorml | 140000 |
| 5 | 60 | RL | 84 | 14260 | Pave | NA | IR1 | Lvl | AllPub | FR2 | Gtl | NoRidge | Norm | Norm | 1Fam | 2Story | 8 | 5 | 2000 | 2000 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 350 | Gd | TA | PConc | Gd | TA | Av | GLQ | 655 | Unf | 0 | 490 | 1145 | GasA | Ex | Y | SBrkr | 1145 | 1053 | 0 | 2198 | 1 | 0 | 2 | 1 | 4 | 1 | Gd | 9 | Typ | 1 | TA | Attchd | 2000 | RFn | 3 | 836 | TA | TA | Y | 192 | 84 | 0 | 0 | 0 | 0 | NA | NA | NA | 0 | 12 | 2008 | WD | Normal | 250000 |
| 6 | 50 | RL | 85 | 14115 | Pave | NA | IR1 | Lvl | AllPub | Inside | Gtl | Mitchel | Norm | Norm | 1Fam | 1.5Fin | 5 | 5 | 1993 | 1995 | Gable | CompShg | VinylSd | VinylSd | None | 0 | TA | TA | Wood | Gd | TA | No | GLQ | 732 | Unf | 0 | 64 | 796 | GasA | Ex | Y | SBrkr | 796 | 566 | 0 | 1362 | 1 | 0 | 1 | 1 | 1 | 1 | TA | 5 | Typ | 0 | NA | Attchd | 1993 | Unf | 2 | 480 | TA | TA | Y | 40 | 30 | 0 | 320 | 0 | 0 | NA | MnPrv | Shed | 700 | 10 | 2009 | WD | Normal | 143000 |
# Dimension of Dataset : Rows X Columns
dim(p2_train)## [1] 1460 81
# Structure of Columns
str(p2_train)## 'data.frame': 1460 obs. of 81 variables:
## $ Id : int 1 2 3 4 5 6 7 8 9 10 ...
## $ MSSubClass : int 60 20 60 70 60 50 20 60 50 190 ...
## $ MSZoning : chr "RL" "RL" "RL" "RL" ...
## $ LotFrontage : int 65 80 68 60 84 85 75 NA 51 50 ...
## $ LotArea : int 8450 9600 11250 9550 14260 14115 10084 10382 6120 7420 ...
## $ Street : chr "Pave" "Pave" "Pave" "Pave" ...
## $ Alley : chr NA NA NA NA ...
## $ LotShape : chr "Reg" "Reg" "IR1" "IR1" ...
## $ LandContour : chr "Lvl" "Lvl" "Lvl" "Lvl" ...
## $ Utilities : chr "AllPub" "AllPub" "AllPub" "AllPub" ...
## $ LotConfig : chr "Inside" "FR2" "Inside" "Corner" ...
## $ LandSlope : chr "Gtl" "Gtl" "Gtl" "Gtl" ...
## $ Neighborhood : chr "CollgCr" "Veenker" "CollgCr" "Crawfor" ...
## $ Condition1 : chr "Norm" "Feedr" "Norm" "Norm" ...
## $ Condition2 : chr "Norm" "Norm" "Norm" "Norm" ...
## $ BldgType : chr "1Fam" "1Fam" "1Fam" "1Fam" ...
## $ HouseStyle : chr "2Story" "1Story" "2Story" "2Story" ...
## $ OverallQual : int 7 6 7 7 8 5 8 7 7 5 ...
## $ OverallCond : int 5 8 5 5 5 5 5 6 5 6 ...
## $ YearBuilt : int 2003 1976 2001 1915 2000 1993 2004 1973 1931 1939 ...
## $ YearRemodAdd : int 2003 1976 2002 1970 2000 1995 2005 1973 1950 1950 ...
## $ RoofStyle : chr "Gable" "Gable" "Gable" "Gable" ...
## $ RoofMatl : chr "CompShg" "CompShg" "CompShg" "CompShg" ...
## $ Exterior1st : chr "VinylSd" "MetalSd" "VinylSd" "Wd Sdng" ...
## $ Exterior2nd : chr "VinylSd" "MetalSd" "VinylSd" "Wd Shng" ...
## $ MasVnrType : chr "BrkFace" "None" "BrkFace" "None" ...
## $ MasVnrArea : int 196 0 162 0 350 0 186 240 0 0 ...
## $ ExterQual : chr "Gd" "TA" "Gd" "TA" ...
## $ ExterCond : chr "TA" "TA" "TA" "TA" ...
## $ Foundation : chr "PConc" "CBlock" "PConc" "BrkTil" ...
## $ BsmtQual : chr "Gd" "Gd" "Gd" "TA" ...
## $ BsmtCond : chr "TA" "TA" "TA" "Gd" ...
## $ BsmtExposure : chr "No" "Gd" "Mn" "No" ...
## $ BsmtFinType1 : chr "GLQ" "ALQ" "GLQ" "ALQ" ...
## $ BsmtFinSF1 : int 706 978 486 216 655 732 1369 859 0 851 ...
## $ BsmtFinType2 : chr "Unf" "Unf" "Unf" "Unf" ...
## $ BsmtFinSF2 : int 0 0 0 0 0 0 0 32 0 0 ...
## $ BsmtUnfSF : int 150 284 434 540 490 64 317 216 952 140 ...
## $ TotalBsmtSF : int 856 1262 920 756 1145 796 1686 1107 952 991 ...
## $ Heating : chr "GasA" "GasA" "GasA" "GasA" ...
## $ HeatingQC : chr "Ex" "Ex" "Ex" "Gd" ...
## $ CentralAir : chr "Y" "Y" "Y" "Y" ...
## $ Electrical : chr "SBrkr" "SBrkr" "SBrkr" "SBrkr" ...
## $ X1stFlrSF : int 856 1262 920 961 1145 796 1694 1107 1022 1077 ...
## $ X2ndFlrSF : int 854 0 866 756 1053 566 0 983 752 0 ...
## $ LowQualFinSF : int 0 0 0 0 0 0 0 0 0 0 ...
## $ GrLivArea : int 1710 1262 1786 1717 2198 1362 1694 2090 1774 1077 ...
## $ BsmtFullBath : int 1 0 1 1 1 1 1 1 0 1 ...
## $ BsmtHalfBath : int 0 1 0 0 0 0 0 0 0 0 ...
## $ FullBath : int 2 2 2 1 2 1 2 2 2 1 ...
## $ HalfBath : int 1 0 1 0 1 1 0 1 0 0 ...
## $ BedroomAbvGr : int 3 3 3 3 4 1 3 3 2 2 ...
## $ KitchenAbvGr : int 1 1 1 1 1 1 1 1 2 2 ...
## $ KitchenQual : chr "Gd" "TA" "Gd" "Gd" ...
## $ TotRmsAbvGrd : int 8 6 6 7 9 5 7 7 8 5 ...
## $ Functional : chr "Typ" "Typ" "Typ" "Typ" ...
## $ Fireplaces : int 0 1 1 1 1 0 1 2 2 2 ...
## $ FireplaceQu : chr NA "TA" "TA" "Gd" ...
## $ GarageType : chr "Attchd" "Attchd" "Attchd" "Detchd" ...
## $ GarageYrBlt : int 2003 1976 2001 1998 2000 1993 2004 1973 1931 1939 ...
## $ GarageFinish : chr "RFn" "RFn" "RFn" "Unf" ...
## $ GarageCars : int 2 2 2 3 3 2 2 2 2 1 ...
## $ GarageArea : int 548 460 608 642 836 480 636 484 468 205 ...
## $ GarageQual : chr "TA" "TA" "TA" "TA" ...
## $ GarageCond : chr "TA" "TA" "TA" "TA" ...
## $ PavedDrive : chr "Y" "Y" "Y" "Y" ...
## $ WoodDeckSF : int 0 298 0 0 192 40 255 235 90 0 ...
## $ OpenPorchSF : int 61 0 42 35 84 30 57 204 0 4 ...
## $ EnclosedPorch: int 0 0 0 272 0 0 0 228 205 0 ...
## $ X3SsnPorch : int 0 0 0 0 0 320 0 0 0 0 ...
## $ ScreenPorch : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PoolArea : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PoolQC : chr NA NA NA NA ...
## $ Fence : chr NA NA NA NA ...
## $ MiscFeature : chr NA NA NA NA ...
## $ MiscVal : int 0 0 0 0 0 700 0 350 0 0 ...
## $ MoSold : int 2 5 9 2 12 10 8 11 4 1 ...
## $ YrSold : int 2008 2007 2008 2006 2008 2009 2007 2009 2008 2008 ...
## $ SaleType : chr "WD" "WD" "WD" "WD" ...
## $ SaleCondition: chr "Normal" "Normal" "Normal" "Abnorml" ...
## $ SalePrice : int 208500 181500 223500 140000 250000 143000 307000 200000 129900 118000 ...
###Enhancing the Training Dataset
For our analysis we will convert some categorical columns to numerical values, using the values provided in the description file These Operations will be done on both the Training and Test Datasets
# Duplicating categorical columns and converting to numerical for additional analysis
# Adding Quantified Foundation Column
p2_train$Foundation_q = p2_train$Foundation
p2_train$Foundation_q <- c(Wood=1, Stone=2, Slab=3, PConc=4, CBlock=5, BrkTil=6)[p2_train$Foundation_q]
p2_test$Foundation_q = p2_test$Foundation
p2_test$Foundation_q <- c(Wood=1, Stone=2, Slab=3, PConc=4, CBlock=5, BrkTil=6)[p2_test$Foundation_q]
# Adding Quantified Basement Type Column
p2_train$BsmtFinType2_q = p2_train$BsmtFinType2
p2_train$BsmtFinType2_q <- c(Unf=1, LwQ=2, Rec=3, BLQ=4, ALQ=5, GLQ=6)[p2_train$BsmtFinType2_q]
p2_test$BsmtFinType2_q = p2_test$BsmtFinType2
p2_test$BsmtFinType2_q <- c(Unf=1, LwQ=2, Rec=3, BLQ=4, ALQ=5, GLQ=6)[p2_test$BsmtFinType2_q]
# Adding Quantified Heating Quality Column
p2_train$HeatingQC_q = p2_train$HeatingQC
p2_train$HeatingQC_q <- c(Po=1, Fa=2, TA=3, Gd=4, Ex=5)[p2_train$HeatingQC_q]
p2_test$HeatingQC_q = p2_test$HeatingQC
p2_test$HeatingQC_q <- c(Po=1, Fa=2, TA=3, Gd=4, Ex=5)[p2_test$HeatingQC_q]
# Adding Quantified Electrical Quality Column
p2_train$Electrical_q = p2_train$Electrical
p2_train$Electrical_q <- c(Mix=1, FuseP=2, FuseF=3, FuseA=4, SBrkr=5)[p2_train$Electrical_q]
p2_test$Electrical_q = p2_test$Electrical
p2_test$Electrical_q <- c(Mix=1, FuseP=2, FuseF=3, FuseA=4, SBrkr=5)[p2_test$Electrical_q]
# Adding Quantified KitChen Quality Column
p2_train$KitchenQual_q = p2_train$KitchenQual
p2_train$KitchenQual_q <- c(Po=1, Fa=2, TA=3, Gd=4, Ex=5)[p2_train$KitchenQual_q]
p2_test$KitchenQual_q = p2_test$KitchenQual
p2_test$KitchenQual_q <- c(Po=1, Fa=2, TA=3, Gd=4, Ex=5)[p2_test$KitchenQual_q]
# Adding Quantified Garage Condition Column
p2_train$GarageCond_q = p2_train$GarageCond
p2_train$GarageCond_q<- c(Po=1, Fa=2, TA=3, Gd=4, Ex=5)[p2_train$GarageCond_q]
p2_test$GarageCond_q = p2_test$GarageCond
p2_test$GarageCond_q<- c(Po=1, Fa=2, TA=3, Gd=4, Ex=5)[p2_test$GarageCond_q]
# Adding Quantified Fence Condition Column
p2_train$Fence_q = p2_train$Fence
p2_train$Fence_q <- c(MnWw=1, GdWo=2, MnPrv=3, GdPrv=4)[p2_train$Fence_q]
p2_test$Fence_q = p2_test$Fence
p2_test$Fence_q <- c(MnWw=1, GdWo=2, MnPrv=3, GdPrv=4)[p2_test$Fence_q]
# Some Cleanup to account for reserved "NA"
p2_train <- replace(p2_train,is.na(p2_train),0)
p2_test <- replace(p2_test,is.na(p2_test),0)
# Printing the enhanced data frame with new quantitative columns
p2_train %>% select(order(colnames(p2_train)))str(p2_train)## 'data.frame': 1460 obs. of 88 variables:
## $ Id : int 1 2 3 4 5 6 7 8 9 10 ...
## $ MSSubClass : int 60 20 60 70 60 50 20 60 50 190 ...
## $ MSZoning : chr "RL" "RL" "RL" "RL" ...
## $ LotFrontage : num 65 80 68 60 84 85 75 0 51 50 ...
## $ LotArea : int 8450 9600 11250 9550 14260 14115 10084 10382 6120 7420 ...
## $ Street : chr "Pave" "Pave" "Pave" "Pave" ...
## $ Alley : chr "0" "0" "0" "0" ...
## $ LotShape : chr "Reg" "Reg" "IR1" "IR1" ...
## $ LandContour : chr "Lvl" "Lvl" "Lvl" "Lvl" ...
## $ Utilities : chr "AllPub" "AllPub" "AllPub" "AllPub" ...
## $ LotConfig : chr "Inside" "FR2" "Inside" "Corner" ...
## $ LandSlope : chr "Gtl" "Gtl" "Gtl" "Gtl" ...
## $ Neighborhood : chr "CollgCr" "Veenker" "CollgCr" "Crawfor" ...
## $ Condition1 : chr "Norm" "Feedr" "Norm" "Norm" ...
## $ Condition2 : chr "Norm" "Norm" "Norm" "Norm" ...
## $ BldgType : chr "1Fam" "1Fam" "1Fam" "1Fam" ...
## $ HouseStyle : chr "2Story" "1Story" "2Story" "2Story" ...
## $ OverallQual : int 7 6 7 7 8 5 8 7 7 5 ...
## $ OverallCond : int 5 8 5 5 5 5 5 6 5 6 ...
## $ YearBuilt : int 2003 1976 2001 1915 2000 1993 2004 1973 1931 1939 ...
## $ YearRemodAdd : int 2003 1976 2002 1970 2000 1995 2005 1973 1950 1950 ...
## $ RoofStyle : chr "Gable" "Gable" "Gable" "Gable" ...
## $ RoofMatl : chr "CompShg" "CompShg" "CompShg" "CompShg" ...
## $ Exterior1st : chr "VinylSd" "MetalSd" "VinylSd" "Wd Sdng" ...
## $ Exterior2nd : chr "VinylSd" "MetalSd" "VinylSd" "Wd Shng" ...
## $ MasVnrType : chr "BrkFace" "None" "BrkFace" "None" ...
## $ MasVnrArea : num 196 0 162 0 350 0 186 240 0 0 ...
## $ ExterQual : chr "Gd" "TA" "Gd" "TA" ...
## $ ExterCond : chr "TA" "TA" "TA" "TA" ...
## $ Foundation : chr "PConc" "CBlock" "PConc" "BrkTil" ...
## $ BsmtQual : chr "Gd" "Gd" "Gd" "TA" ...
## $ BsmtCond : chr "TA" "TA" "TA" "Gd" ...
## $ BsmtExposure : chr "No" "Gd" "Mn" "No" ...
## $ BsmtFinType1 : chr "GLQ" "ALQ" "GLQ" "ALQ" ...
## $ BsmtFinSF1 : int 706 978 486 216 655 732 1369 859 0 851 ...
## $ BsmtFinType2 : chr "Unf" "Unf" "Unf" "Unf" ...
## $ BsmtFinSF2 : int 0 0 0 0 0 0 0 32 0 0 ...
## $ BsmtUnfSF : int 150 284 434 540 490 64 317 216 952 140 ...
## $ TotalBsmtSF : int 856 1262 920 756 1145 796 1686 1107 952 991 ...
## $ Heating : chr "GasA" "GasA" "GasA" "GasA" ...
## $ HeatingQC : chr "Ex" "Ex" "Ex" "Gd" ...
## $ CentralAir : chr "Y" "Y" "Y" "Y" ...
## $ Electrical : chr "SBrkr" "SBrkr" "SBrkr" "SBrkr" ...
## $ X1stFlrSF : int 856 1262 920 961 1145 796 1694 1107 1022 1077 ...
## $ X2ndFlrSF : int 854 0 866 756 1053 566 0 983 752 0 ...
## $ LowQualFinSF : int 0 0 0 0 0 0 0 0 0 0 ...
## $ GrLivArea : int 1710 1262 1786 1717 2198 1362 1694 2090 1774 1077 ...
## $ BsmtFullBath : int 1 0 1 1 1 1 1 1 0 1 ...
## $ BsmtHalfBath : int 0 1 0 0 0 0 0 0 0 0 ...
## $ FullBath : int 2 2 2 1 2 1 2 2 2 1 ...
## $ HalfBath : int 1 0 1 0 1 1 0 1 0 0 ...
## $ BedroomAbvGr : int 3 3 3 3 4 1 3 3 2 2 ...
## $ KitchenAbvGr : int 1 1 1 1 1 1 1 1 2 2 ...
## $ KitchenQual : chr "Gd" "TA" "Gd" "Gd" ...
## $ TotRmsAbvGrd : int 8 6 6 7 9 5 7 7 8 5 ...
## $ Functional : chr "Typ" "Typ" "Typ" "Typ" ...
## $ Fireplaces : int 0 1 1 1 1 0 1 2 2 2 ...
## $ FireplaceQu : chr "0" "TA" "TA" "Gd" ...
## $ GarageType : chr "Attchd" "Attchd" "Attchd" "Detchd" ...
## $ GarageYrBlt : num 2003 1976 2001 1998 2000 ...
## $ GarageFinish : chr "RFn" "RFn" "RFn" "Unf" ...
## $ GarageCars : int 2 2 2 3 3 2 2 2 2 1 ...
## $ GarageArea : int 548 460 608 642 836 480 636 484 468 205 ...
## $ GarageQual : chr "TA" "TA" "TA" "TA" ...
## $ GarageCond : chr "TA" "TA" "TA" "TA" ...
## $ PavedDrive : chr "Y" "Y" "Y" "Y" ...
## $ WoodDeckSF : int 0 298 0 0 192 40 255 235 90 0 ...
## $ OpenPorchSF : int 61 0 42 35 84 30 57 204 0 4 ...
## $ EnclosedPorch : int 0 0 0 272 0 0 0 228 205 0 ...
## $ X3SsnPorch : int 0 0 0 0 0 320 0 0 0 0 ...
## $ ScreenPorch : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PoolArea : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PoolQC : chr "0" "0" "0" "0" ...
## $ Fence : chr "0" "0" "0" "0" ...
## $ MiscFeature : chr "0" "0" "0" "0" ...
## $ MiscVal : int 0 0 0 0 0 700 0 350 0 0 ...
## $ MoSold : int 2 5 9 2 12 10 8 11 4 1 ...
## $ YrSold : int 2008 2007 2008 2006 2008 2009 2007 2009 2008 2008 ...
## $ SaleType : chr "WD" "WD" "WD" "WD" ...
## $ SaleCondition : chr "Normal" "Normal" "Normal" "Abnorml" ...
## $ SalePrice : int 208500 181500 223500 140000 250000 143000 307000 200000 129900 118000 ...
## $ Foundation_q : num 4 5 4 6 4 1 4 5 6 6 ...
## $ BsmtFinType2_q: num 1 1 1 1 1 1 1 4 1 1 ...
## $ HeatingQC_q : num 5 5 5 4 5 5 5 5 4 5 ...
## $ Electrical_q : num 5 5 5 5 5 5 5 5 3 5 ...
## $ KitchenQual_q : num 4 3 4 4 4 3 4 3 3 3 ...
## $ GarageCond_q : num 3 3 3 3 3 3 3 3 3 3 ...
## $ Fence_q : num 0 0 0 0 0 3 0 0 0 0 ...
kable(head(p2_train))| Id | MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | OverallQual | OverallCond | YearBuilt | YearRemodAdd | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | MasVnrArea | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinSF1 | BsmtFinType2 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | Heating | HeatingQC | CentralAir | Electrical | X1stFlrSF | X2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath | FullBath | HalfBath | BedroomAbvGr | KitchenAbvGr | KitchenQual | TotRmsAbvGrd | Functional | Fireplaces | FireplaceQu | GarageType | GarageYrBlt | GarageFinish | GarageCars | GarageArea | GarageQual | GarageCond | PavedDrive | WoodDeckSF | OpenPorchSF | EnclosedPorch | X3SsnPorch | ScreenPorch | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | SalePrice | Foundation_q | BsmtFinType2_q | HeatingQC_q | Electrical_q | KitchenQual_q | GarageCond_q | Fence_q |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 60 | RL | 65 | 8450 | Pave | 0 | Reg | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | 7 | 5 | 2003 | 2003 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 196 | Gd | TA | PConc | Gd | TA | No | GLQ | 706 | Unf | 0 | 150 | 856 | GasA | Ex | Y | SBrkr | 856 | 854 | 0 | 1710 | 1 | 0 | 2 | 1 | 3 | 1 | Gd | 8 | Typ | 0 | 0 | Attchd | 2003 | RFn | 2 | 548 | TA | TA | Y | 0 | 61 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2008 | WD | Normal | 208500 | 4 | 1 | 5 | 5 | 4 | 3 | 0 |
| 2 | 20 | RL | 80 | 9600 | Pave | 0 | Reg | Lvl | AllPub | FR2 | Gtl | Veenker | Feedr | Norm | 1Fam | 1Story | 6 | 8 | 1976 | 1976 | Gable | CompShg | MetalSd | MetalSd | None | 0 | TA | TA | CBlock | Gd | TA | Gd | ALQ | 978 | Unf | 0 | 284 | 1262 | GasA | Ex | Y | SBrkr | 1262 | 0 | 0 | 1262 | 0 | 1 | 2 | 0 | 3 | 1 | TA | 6 | Typ | 1 | TA | Attchd | 1976 | RFn | 2 | 460 | TA | TA | Y | 298 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 2007 | WD | Normal | 181500 | 5 | 1 | 5 | 5 | 3 | 3 | 0 |
| 3 | 60 | RL | 68 | 11250 | Pave | 0 | IR1 | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | 7 | 5 | 2001 | 2002 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 162 | Gd | TA | PConc | Gd | TA | Mn | GLQ | 486 | Unf | 0 | 434 | 920 | GasA | Ex | Y | SBrkr | 920 | 866 | 0 | 1786 | 1 | 0 | 2 | 1 | 3 | 1 | Gd | 6 | Typ | 1 | TA | Attchd | 2001 | RFn | 2 | 608 | TA | TA | Y | 0 | 42 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 9 | 2008 | WD | Normal | 223500 | 4 | 1 | 5 | 5 | 4 | 3 | 0 |
| 4 | 70 | RL | 60 | 9550 | Pave | 0 | IR1 | Lvl | AllPub | Corner | Gtl | Crawfor | Norm | Norm | 1Fam | 2Story | 7 | 5 | 1915 | 1970 | Gable | CompShg | Wd Sdng | Wd Shng | None | 0 | TA | TA | BrkTil | TA | Gd | No | ALQ | 216 | Unf | 0 | 540 | 756 | GasA | Gd | Y | SBrkr | 961 | 756 | 0 | 1717 | 1 | 0 | 1 | 0 | 3 | 1 | Gd | 7 | Typ | 1 | Gd | Detchd | 1998 | Unf | 3 | 642 | TA | TA | Y | 0 | 35 | 272 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2006 | WD | Abnorml | 140000 | 6 | 1 | 4 | 5 | 4 | 3 | 0 |
| 5 | 60 | RL | 84 | 14260 | Pave | 0 | IR1 | Lvl | AllPub | FR2 | Gtl | NoRidge | Norm | Norm | 1Fam | 2Story | 8 | 5 | 2000 | 2000 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 350 | Gd | TA | PConc | Gd | TA | Av | GLQ | 655 | Unf | 0 | 490 | 1145 | GasA | Ex | Y | SBrkr | 1145 | 1053 | 0 | 2198 | 1 | 0 | 2 | 1 | 4 | 1 | Gd | 9 | Typ | 1 | TA | Attchd | 2000 | RFn | 3 | 836 | TA | TA | Y | 192 | 84 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 12 | 2008 | WD | Normal | 250000 | 4 | 1 | 5 | 5 | 4 | 3 | 0 |
| 6 | 50 | RL | 85 | 14115 | Pave | 0 | IR1 | Lvl | AllPub | Inside | Gtl | Mitchel | Norm | Norm | 1Fam | 1.5Fin | 5 | 5 | 1993 | 1995 | Gable | CompShg | VinylSd | VinylSd | None | 0 | TA | TA | Wood | Gd | TA | No | GLQ | 732 | Unf | 0 | 64 | 796 | GasA | Ex | Y | SBrkr | 796 | 566 | 0 | 1362 | 1 | 0 | 1 | 1 | 1 | 1 | TA | 5 | Typ | 0 | 0 | Attchd | 1993 | Unf | 2 | 480 | TA | TA | Y | 40 | 30 | 0 | 320 | 0 | 0 | 0 | MnPrv | Shed | 700 | 10 | 2009 | WD | Normal | 143000 | 1 | 1 | 5 | 5 | 3 | 3 | 3 |
Investigative Scatter Plots
p2_train$OverallCond_factor <- as.factor(as.character(p2_train$OverallCond))
ggplot(p2_train, aes(x=OverallCond, y=SalePrice, fill=OverallCond_factor)) + geom_boxplot()
p2_train$OverallCond_factor<-NULLggplot(p2_train, aes(x=Neighborhood, y=SalePrice, fill=Neighborhood)) + geom_boxplot()+ coord_flip()
A graphical view of the Sales Price Spread vs Year Remodeled Note that We used the year Remodeled vs Year Built since for the homes that were not Remodeled, the year built was used.
ggplot(p2_train, aes(x = YearRemodAdd, y = SalePrice)) +
geom_point()+
geom_smooth(method=lm) +
scale_y_continuous(labels = scales::comma)## `geom_smooth()` using formula = 'y ~ x'

Correlation Matricies
Derive a correlation matrix for any three quantitative variables in the dataset
Note: as a step further we will compute the Correlation Matrix for a range of quantitative variables below
corr_data<-dplyr::select(p2_train,SalePrice,LotArea,BsmtFinSF2,GarageArea,YearRemodAdd,OverallCond,TotalBsmtSF,GrLivArea,HeatingQC_q,Electrical_q,KitchenQual_q,Fence_q,GarageCond_q)
corr_matrix<-round(cor(corr_data),4)
#Correlation Matrix with correlation matrix coefficients
corrplot(corr_matrix, method = 'number') # colorful number
# Another Visual of the Correlation Matrix
corrplot(corr_matrix, order = 'hclust', addrect = 2)
Correlation Hypothesis Testing
We are required to compute 3 pairs, we will compute an additional 3 pairs to solidify the concept
Sales Price Vs Year Remodeled
This pair of variable computes a low P-Value indication a likely non zero(0) correlation and 80% confidence that the correlation is between 0.481 and 0.531. The sample estimate is +0.51
cor.test(corr_data$SalePrice,corr_data$YearRemodAdd, conf.level = 0.8)##
## Pearson's product-moment correlation
##
## data: corr_data$SalePrice and corr_data$YearRemodAdd
## t = 22.466, df = 1458, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 80 percent confidence interval:
## 0.4817381 0.5316150
## sample estimates:
## cor
## 0.507101
Sales Price Vs Lot Area
This pair of variable computes a low P-Value indication a likely non zero(0) correlation and 80% confidence that the correlation is between 0.232 and 0.294 The sample estimate is +0.26
cor.test(corr_data$SalePrice,corr_data$LotArea, conf.level = 0.8)##
## Pearson's product-moment correlation
##
## data: corr_data$SalePrice and corr_data$LotArea
## t = 10.445, df = 1458, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 80 percent confidence interval:
## 0.2323391 0.2947946
## sample estimates:
## cor
## 0.2638434
Sales Price Vs Overall Condition
This pair of variable computes a low P-Value (0.002) indication a likely non zero(0) correlation and 80% confidence that the correlation is between -0.111 and -0.044 The sample estimate is -0.07, This indicates a zero to slight inverse relationship between Sales Price and Overall Condition.
cor.test(corr_data$SalePrice,corr_data$OverallCond, conf.level = 0.8)##
## Pearson's product-moment correlation
##
## data: corr_data$SalePrice and corr_data$OverallCond
## t = -2.9819, df = 1458, p-value = 0.002912
## alternative hypothesis: true correlation is not equal to 0
## 80 percent confidence interval:
## -0.1111272 -0.0444103
## sample estimates:
## cor
## -0.07785589
Sales Price Vs Quality of Heating System
The Heating Quality variable is derived from converting a categorical field to a numeric
This pair of variable computes a low P-Value indication a likely non zero(0) correlation and 80% confidence that the correlation is between 0.399 and 0.454 The sample estimate is +0.427
cor.test(corr_data$SalePrice,corr_data$HeatingQC_q, conf.level = 0.8)##
## Pearson's product-moment correlation
##
## data: corr_data$SalePrice and corr_data$HeatingQC_q
## t = 18.064, df = 1458, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 80 percent confidence interval:
## 0.3998256 0.4546844
## sample estimates:
## cor
## 0.4276487
Sales Price Vs Kitchen Quality
The Kitchen Quality variable is derived from converting a categorical field to a numeric
This pair of variable computes a low P-Value indication a likely non zero(0) correlation and 80% confidence that the correlation is between 0.640 and 0.678 The sample estimate is +0.659
cor.test(corr_data$SalePrice,corr_data$KitchenQual_q, conf.level = 0.8)##
## Pearson's product-moment correlation
##
## data: corr_data$SalePrice and corr_data$KitchenQual_q
## t = 33.509, df = 1458, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 80 percent confidence interval:
## 0.6402106 0.6781490
## sample estimates:
## cor
## 0.6595997
Sales Price Vs Fence Condition
The Fence Quality variable is derived from converting a categorical field to a numeric
This pair of variable computes a low P-Value indication a likely non zero(0) correlation and 80% confidence that the correlation is between -0.179 and -0.113 The sample estimate is -0.146, This indicates an inverse relationship between Sales Price and Overall Condition
cor.test(corr_data$SalePrice,corr_data$Fence_q, conf.level = 0.8)##
## Pearson's product-moment correlation
##
## data: corr_data$SalePrice and corr_data$Fence_q
## t = -5.6724, df = 1458, p-value = 1.696e-08
## alternative hypothesis: true correlation is not equal to 0
## 80 percent confidence interval:
## -0.1796174 -0.1139418
## sample estimates:
## cor
## -0.1469415
Pairwise Correlation Discussion
The six correlation analysis pairs of variables show that correlation exist between the Sales Price (Dependent Variable) and the Independent Variables examined. There are some strong correlation in the 80% confidence interval except for the “Fence Condition” and “Overall Condition” Variables which indicated a zero to slight inverse relationship to the Sales Price. This quantitative evidence is not worrying with respect to familywise errors.
Linear Algebra and Correlation
Inverting the Correlation Matrix to Create Precision Matrix
## The Current Correlation Matrix
corr_matrix## SalePrice LotArea BsmtFinSF2 GarageArea YearRemodAdd OverallCond
## SalePrice 1.0000 0.2638 -0.0114 0.6234 0.5071 -0.0779
## LotArea 0.2638 1.0000 0.1112 0.1804 0.0138 -0.0056
## BsmtFinSF2 -0.0114 0.1112 1.0000 -0.0182 -0.0678 0.0402
## GarageArea 0.6234 0.1804 -0.0182 1.0000 0.3716 -0.1515
## YearRemodAdd 0.5071 0.0138 -0.0678 0.3716 1.0000 0.0737
## OverallCond -0.0779 -0.0056 0.0402 -0.1515 0.0737 1.0000
## TotalBsmtSF 0.6136 0.2608 0.1048 0.4867 0.2911 -0.1711
## GrLivArea 0.7086 0.2631 -0.0096 0.4690 0.2874 -0.0797
## HeatingQC_q 0.4276 0.0036 -0.0745 0.2955 0.5500 -0.0141
## Electrical_q 0.2236 0.0458 0.0292 0.2141 0.3141 0.0973
## KitchenQual_q 0.6596 0.0679 -0.0451 0.4896 0.6253 -0.0267
## Fence_q -0.1469 -0.0414 0.1153 -0.1228 -0.1411 0.1697
## GarageCond_q 0.2632 0.0761 0.0444 0.5473 0.1441 0.0167
## TotalBsmtSF GrLivArea HeatingQC_q Electrical_q KitchenQual_q
## SalePrice 0.6136 0.7086 0.4276 0.2236 0.6596
## LotArea 0.2608 0.2631 0.0036 0.0458 0.0679
## BsmtFinSF2 0.1048 -0.0096 -0.0745 0.0292 -0.0451
## GarageArea 0.4867 0.4690 0.2955 0.2141 0.4896
## YearRemodAdd 0.2911 0.2874 0.5500 0.3141 0.6253
## OverallCond -0.1711 -0.0797 -0.0141 0.0973 -0.0267
## TotalBsmtSF 1.0000 0.4549 0.2657 0.1803 0.4326
## GrLivArea 0.4549 1.0000 0.2546 0.1211 0.4206
## HeatingQC_q 0.2657 0.2546 1.0000 0.1861 0.5042
## Electrical_q 0.1803 0.1211 0.1861 1.0000 0.2313
## KitchenQual_q 0.4326 0.4206 0.5042 0.2313 1.0000
## Fence_q -0.1094 -0.0784 -0.1803 0.0218 -0.1317
## GarageCond_q 0.1766 0.1533 0.1553 0.1889 0.2321
## Fence_q GarageCond_q
## SalePrice -0.1469 0.2632
## LotArea -0.0414 0.0761
## BsmtFinSF2 0.1153 0.0444
## GarageArea -0.1228 0.5473
## YearRemodAdd -0.1411 0.1441
## OverallCond 0.1697 0.0167
## TotalBsmtSF -0.1094 0.1766
## GrLivArea -0.0784 0.1533
## HeatingQC_q -0.1803 0.1553
## Electrical_q 0.0218 0.1889
## KitchenQual_q -0.1317 0.2321
## Fence_q 1.0000 0.0030
## GarageCond_q 0.0030 1.0000
## Inverting the correlation Matrix to Create the precision Matrix
Invert_matrix<-round(solve(corr_matrix),4)
## Creating the Precision Matrix
### corr X invert
precision_matrix_1 <- round(corr_matrix %*% Invert_matrix,4)
precision_matrix_1## SalePrice LotArea BsmtFinSF2 GarageArea YearRemodAdd OverallCond
## SalePrice 0.9999 -1e-04 0 0e+00 0 -1e-04
## LotArea -0.0001 1e+00 0 0e+00 0 -1e-04
## BsmtFinSF2 0.0000 0e+00 1 0e+00 0 0e+00
## GarageArea -0.0001 -1e-04 0 1e+00 0 -1e-04
## YearRemodAdd -0.0001 0e+00 0 0e+00 1 -1e-04
## OverallCond 0.0000 0e+00 0 0e+00 0 1e+00
## TotalBsmtSF -0.0001 -1e-04 0 0e+00 0 -1e-04
## GrLivArea -0.0001 0e+00 0 0e+00 0 0e+00
## HeatingQC_q 0.0000 0e+00 0 0e+00 0 0e+00
## Electrical_q 0.0000 0e+00 0 0e+00 0 0e+00
## KitchenQual_q -0.0001 -1e-04 0 0e+00 0 -1e-04
## Fence_q 0.0000 0e+00 0 0e+00 0 0e+00
## GarageCond_q -0.0001 -1e-04 0 -1e-04 0 0e+00
## TotalBsmtSF GrLivArea HeatingQC_q Electrical_q KitchenQual_q
## SalePrice 0 0 0 0 0
## LotArea 0 0 0 0 0
## BsmtFinSF2 0 0 0 0 0
## GarageArea 0 0 0 0 0
## YearRemodAdd 0 0 0 0 0
## OverallCond 0 0 0 0 0
## TotalBsmtSF 1 0 0 0 0
## GrLivArea 0 1 0 0 0
## HeatingQC_q 0 0 1 0 0
## Electrical_q 0 0 0 1 0
## KitchenQual_q 0 0 0 0 1
## Fence_q 0 0 0 0 0
## GarageCond_q 0 0 0 0 0
## Fence_q GarageCond_q
## SalePrice 0e+00 -1e-04
## LotArea 0e+00 -1e-04
## BsmtFinSF2 0e+00 0e+00
## GarageArea 0e+00 -1e-04
## YearRemodAdd 0e+00 -1e-04
## OverallCond 0e+00 0e+00
## TotalBsmtSF 0e+00 -1e-04
## GrLivArea 0e+00 -1e-04
## HeatingQC_q 0e+00 -1e-04
## Electrical_q 0e+00 -1e-04
## KitchenQual_q -1e-04 -1e-04
## Fence_q 1e+00 0e+00
## GarageCond_q 0e+00 1e+00
### invert X corr
precision_matrix_2 <- round(Invert_matrix %*% corr_matrix,4)
precision_matrix_2## SalePrice LotArea BsmtFinSF2 GarageArea YearRemodAdd OverallCond
## SalePrice 0.9999 -1e-04 0 -1e-04 -1e-04 0
## LotArea -0.0001 1e+00 0 -1e-04 0e+00 0
## BsmtFinSF2 0.0000 0e+00 1 0e+00 0e+00 0
## GarageArea 0.0000 0e+00 0 1e+00 0e+00 0
## YearRemodAdd 0.0000 0e+00 0 0e+00 1e+00 0
## OverallCond -0.0001 -1e-04 0 -1e-04 -1e-04 1
## TotalBsmtSF 0.0000 0e+00 0 0e+00 0e+00 0
## GrLivArea 0.0000 0e+00 0 0e+00 0e+00 0
## HeatingQC_q 0.0000 0e+00 0 0e+00 0e+00 0
## Electrical_q 0.0000 0e+00 0 0e+00 0e+00 0
## KitchenQual_q 0.0000 0e+00 0 0e+00 0e+00 0
## Fence_q 0.0000 0e+00 0 0e+00 0e+00 0
## GarageCond_q -0.0001 -1e-04 0 -1e-04 -1e-04 0
## TotalBsmtSF GrLivArea HeatingQC_q Electrical_q KitchenQual_q
## SalePrice -1e-04 -1e-04 0e+00 0e+00 -1e-04
## LotArea -1e-04 0e+00 0e+00 0e+00 -1e-04
## BsmtFinSF2 0e+00 0e+00 0e+00 0e+00 0e+00
## GarageArea 0e+00 0e+00 0e+00 0e+00 0e+00
## YearRemodAdd 0e+00 0e+00 0e+00 0e+00 0e+00
## OverallCond -1e-04 0e+00 0e+00 0e+00 -1e-04
## TotalBsmtSF 1e+00 0e+00 0e+00 0e+00 0e+00
## GrLivArea 0e+00 1e+00 0e+00 0e+00 0e+00
## HeatingQC_q 0e+00 0e+00 1e+00 0e+00 0e+00
## Electrical_q 0e+00 0e+00 0e+00 1e+00 0e+00
## KitchenQual_q 0e+00 0e+00 0e+00 0e+00 1e+00
## Fence_q 0e+00 0e+00 0e+00 0e+00 -1e-04
## GarageCond_q -1e-04 -1e-04 -1e-04 -1e-04 -1e-04
## Fence_q GarageCond_q
## SalePrice 0 -1e-04
## LotArea 0 -1e-04
## BsmtFinSF2 0 0e+00
## GarageArea 0 -1e-04
## YearRemodAdd 0 0e+00
## OverallCond 0 0e+00
## TotalBsmtSF 0 0e+00
## GrLivArea 0 0e+00
## HeatingQC_q 0 0e+00
## Electrical_q 0 0e+00
## KitchenQual_q 0 0e+00
## Fence_q 1 0e+00
## GarageCond_q 0 1e+00
### The Decomposition of precision_matrix_1 is :
decomp_pm1 <- lu.decomposition(precision_matrix_1)
decomp_pm1## $L
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 1.00000000 0.0000e+00 0 0e+00 0 0.0000e+00 0 0 0 0
## [2,] -0.00010001 1.0000e+00 0 0e+00 0 0.0000e+00 0 0 0 0
## [3,] 0.00000000 0.0000e+00 1 0e+00 0 0.0000e+00 0 0 0 0
## [4,] -0.00010001 -1.0001e-04 0 1e+00 0 0.0000e+00 0 0 0 0
## [5,] -0.00010001 -1.0001e-08 0 0e+00 1 0.0000e+00 0 0 0 0
## [6,] 0.00000000 0.0000e+00 0 0e+00 0 1.0000e+00 0 0 0 0
## [7,] -0.00010001 -1.0001e-04 0 0e+00 0 -1.0002e-04 1 0 0 0
## [8,] -0.00010001 -1.0001e-08 0 0e+00 0 -1.0002e-08 0 1 0 0
## [9,] 0.00000000 0.0000e+00 0 0e+00 0 0.0000e+00 0 0 1 0
## [10,] 0.00000000 0.0000e+00 0 0e+00 0 0.0000e+00 0 0 0 1
## [11,] -0.00010001 -1.0001e-04 0 0e+00 0 -1.0002e-04 0 0 0 0
## [12,] 0.00000000 0.0000e+00 0 0e+00 0 0.0000e+00 0 0 0 0
## [13,] -0.00010001 -1.0001e-04 0 -1e-04 0 -3.0005e-08 0 0 0 0
## [,11] [,12] [,13]
## [1,] 0 0 0
## [2,] 0 0 0
## [3,] 0 0 0
## [4,] 0 0 0
## [5,] 0 0 0
## [6,] 0 0 0
## [7,] 0 0 0
## [8,] 0 0 0
## [9,] 0 0 0
## [10,] 0 0 0
## [11,] 1 0 0
## [12,] 0 1 0
## [13,] 0 0 1
##
## $U
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
## [1,] 0.9999 -1e-04 0 0 0 -0.00010000 0 0 0 0 0
## [2,] 0.0000 1e+00 0 0 0 -0.00010001 0 0 0 0 0
## [3,] 0.0000 0e+00 1 0 0 0.00000000 0 0 0 0 0
## [4,] 0.0000 0e+00 0 1 0 -0.00010002 0 0 0 0 0
## [5,] 0.0000 0e+00 0 0 1 -0.00010001 0 0 0 0 0
## [6,] 0.0000 0e+00 0 0 0 1.00000000 0 0 0 0 0
## [7,] 0.0000 0e+00 0 0 0 0.00000000 1 0 0 0 0
## [8,] 0.0000 0e+00 0 0 0 0.00000000 0 1 0 0 0
## [9,] 0.0000 0e+00 0 0 0 0.00000000 0 0 1 0 0
## [10,] 0.0000 0e+00 0 0 0 0.00000000 0 0 0 1 0
## [11,] 0.0000 0e+00 0 0 0 0.00000000 0 0 0 0 1
## [12,] 0.0000 0e+00 0 0 0 0.00000000 0 0 0 0 0
## [13,] 0.0000 0e+00 0 0 0 0.00000000 0 0 0 0 0
## [,12] [,13]
## [1,] 0e+00 -0.00010000
## [2,] 0e+00 -0.00010001
## [3,] 0e+00 0.00000000
## [4,] 0e+00 -0.00010002
## [5,] 0e+00 -0.00010001
## [6,] 0e+00 0.00000000
## [7,] 0e+00 -0.00010002
## [8,] 0e+00 -0.00010001
## [9,] 0e+00 -0.00010000
## [10,] 0e+00 -0.00010000
## [11,] -1e-04 -0.00010002
## [12,] 1e+00 0.00000000
## [13,] 0e+00 0.99999997
### The Decomposition of precision_matrix_1 is :
decomp_pm2 <- lu.decomposition(precision_matrix_2)
decomp_pm2## $L
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 1.00000000 0.00000000 0 0.00000000 0.00000000 0 0.00000000
## [2,] -0.00010001 1.00000000 0 0.00000000 0.00000000 0 0.00000000
## [3,] 0.00000000 0.00000000 1 0.00000000 0.00000000 0 0.00000000
## [4,] 0.00000000 0.00000000 0 1.00000000 0.00000000 0 0.00000000
## [5,] 0.00000000 0.00000000 0 0.00000000 1.00000000 0 0.00000000
## [6,] -0.00010001 -0.00010001 0 -0.00010002 -0.00010001 1 0.00000000
## [7,] 0.00000000 0.00000000 0 0.00000000 0.00000000 0 1.00000000
## [8,] 0.00000000 0.00000000 0 0.00000000 0.00000000 0 0.00000000
## [9,] 0.00000000 0.00000000 0 0.00000000 0.00000000 0 0.00000000
## [10,] 0.00000000 0.00000000 0 0.00000000 0.00000000 0 0.00000000
## [11,] 0.00000000 0.00000000 0 0.00000000 0.00000000 0 0.00000000
## [12,] 0.00000000 0.00000000 0 0.00000000 0.00000000 0 0.00000000
## [13,] -0.00010001 -0.00010001 0 -0.00010002 -0.00010001 0 -0.00010002
## [,8] [,9] [,10] [,11] [,12] [,13]
## [1,] 0.00000000 0e+00 0e+00 0.00000000 0 0
## [2,] 0.00000000 0e+00 0e+00 0.00000000 0 0
## [3,] 0.00000000 0e+00 0e+00 0.00000000 0 0
## [4,] 0.00000000 0e+00 0e+00 0.00000000 0 0
## [5,] 0.00000000 0e+00 0e+00 0.00000000 0 0
## [6,] 0.00000000 0e+00 0e+00 0.00000000 0 0
## [7,] 0.00000000 0e+00 0e+00 0.00000000 0 0
## [8,] 1.00000000 0e+00 0e+00 0.00000000 0 0
## [9,] 0.00000000 1e+00 0e+00 0.00000000 0 0
## [10,] 0.00000000 0e+00 1e+00 0.00000000 0 0
## [11,] 0.00000000 0e+00 0e+00 1.00000000 0 0
## [12,] 0.00000000 0e+00 0e+00 -0.00010000 1 0
## [13,] -0.00010001 -1e-04 -1e-04 -0.00010002 0 1
##
## $U
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] 0.9999 -1e-04 0 -0.00010000 -1.0000e-04 0 -0.00010000 -1.0000e-04
## [2,] 0.0000 1e+00 0 -0.00010001 -1.0001e-08 0 -0.00010001 -1.0001e-08
## [3,] 0.0000 0e+00 1 0.00000000 0.0000e+00 0 0.00000000 0.0000e+00
## [4,] 0.0000 0e+00 0 1.00000000 0.0000e+00 0 0.00000000 0.0000e+00
## [5,] 0.0000 0e+00 0 0.00000000 1.0000e+00 0 0.00000000 0.0000e+00
## [6,] 0.0000 0e+00 0 0.00000000 0.0000e+00 1 -0.00010002 -1.0002e-08
## [7,] 0.0000 0e+00 0 0.00000000 0.0000e+00 0 1.00000000 0.0000e+00
## [8,] 0.0000 0e+00 0 0.00000000 0.0000e+00 0 0.00000000 1.0000e+00
## [9,] 0.0000 0e+00 0 0.00000000 0.0000e+00 0 0.00000000 0.0000e+00
## [10,] 0.0000 0e+00 0 0.00000000 0.0000e+00 0 0.00000000 0.0000e+00
## [11,] 0.0000 0e+00 0 0.00000000 0.0000e+00 0 0.00000000 0.0000e+00
## [12,] 0.0000 0e+00 0 0.00000000 0.0000e+00 0 0.00000000 0.0000e+00
## [13,] 0.0000 0e+00 0 0.00000000 0.0000e+00 0 0.00000000 0.0000e+00
## [,9] [,10] [,11] [,12] [,13]
## [1,] 0 0 -0.00010000 0 -1.0000e-04
## [2,] 0 0 -0.00010001 0 -1.0001e-04
## [3,] 0 0 0.00000000 0 0.0000e+00
## [4,] 0 0 0.00000000 0 -1.0000e-04
## [5,] 0 0 0.00000000 0 0.0000e+00
## [6,] 0 0 -0.00010002 0 -3.0005e-08
## [7,] 0 0 0.00000000 0 0.0000e+00
## [8,] 0 0 0.00000000 0 0.0000e+00
## [9,] 1 0 0.00000000 0 0.0000e+00
## [10,] 0 1 0.00000000 0 0.0000e+00
## [11,] 0 0 1.00000000 0 0.0000e+00
## [12,] 0 0 0.00000000 1 0.0000e+00
## [13,] 0 0 0.00000000 0 1.0000e+00
## Comparing the 2 Matrices, we see that they are equal ( when rounded to 3 decimal places)
round(precision_matrix_1 ,3)== round(precision_matrix_2,3)## SalePrice LotArea BsmtFinSF2 GarageArea YearRemodAdd OverallCond
## SalePrice TRUE TRUE TRUE TRUE TRUE TRUE
## LotArea TRUE TRUE TRUE TRUE TRUE TRUE
## BsmtFinSF2 TRUE TRUE TRUE TRUE TRUE TRUE
## GarageArea TRUE TRUE TRUE TRUE TRUE TRUE
## YearRemodAdd TRUE TRUE TRUE TRUE TRUE TRUE
## OverallCond TRUE TRUE TRUE TRUE TRUE TRUE
## TotalBsmtSF TRUE TRUE TRUE TRUE TRUE TRUE
## GrLivArea TRUE TRUE TRUE TRUE TRUE TRUE
## HeatingQC_q TRUE TRUE TRUE TRUE TRUE TRUE
## Electrical_q TRUE TRUE TRUE TRUE TRUE TRUE
## KitchenQual_q TRUE TRUE TRUE TRUE TRUE TRUE
## Fence_q TRUE TRUE TRUE TRUE TRUE TRUE
## GarageCond_q TRUE TRUE TRUE TRUE TRUE TRUE
## TotalBsmtSF GrLivArea HeatingQC_q Electrical_q KitchenQual_q
## SalePrice TRUE TRUE TRUE TRUE TRUE
## LotArea TRUE TRUE TRUE TRUE TRUE
## BsmtFinSF2 TRUE TRUE TRUE TRUE TRUE
## GarageArea TRUE TRUE TRUE TRUE TRUE
## YearRemodAdd TRUE TRUE TRUE TRUE TRUE
## OverallCond TRUE TRUE TRUE TRUE TRUE
## TotalBsmtSF TRUE TRUE TRUE TRUE TRUE
## GrLivArea TRUE TRUE TRUE TRUE TRUE
## HeatingQC_q TRUE TRUE TRUE TRUE TRUE
## Electrical_q TRUE TRUE TRUE TRUE TRUE
## KitchenQual_q TRUE TRUE TRUE TRUE TRUE
## Fence_q TRUE TRUE TRUE TRUE TRUE
## GarageCond_q TRUE TRUE TRUE TRUE TRUE
## Fence_q GarageCond_q
## SalePrice TRUE TRUE
## LotArea TRUE TRUE
## BsmtFinSF2 TRUE TRUE
## GarageArea TRUE TRUE
## YearRemodAdd TRUE TRUE
## OverallCond TRUE TRUE
## TotalBsmtSF TRUE TRUE
## GrLivArea TRUE TRUE
## HeatingQC_q TRUE TRUE
## Electrical_q TRUE TRUE
## KitchenQual_q TRUE TRUE
## Fence_q TRUE TRUE
## GarageCond_q TRUE TRUE
Calculus-Based Probability & Statistics
Many times, it makes sense to fit a closed form distribution to data. Select a variable in the Kaggle.com training dataset that is skewed to the right, shift it so that the minimum value is absolutely above zero if necessary
head(corr_data)skew(corr_data, na.rm = TRUE)## [1] 1.8790086 12.1826150 4.2465214 0.1796113 -0.5025278 0.6916440
## [7] 1.5211239 1.3637536 -0.5393477 -4.7209627 0.3859710 1.8034393
## [13] -3.3250565
## We see that "LotArea" field is the most RIGHT skewed with a value of :
round(skew(corr_data$LotArea, na.rm = TRUE),3)## [1] 12.183
## A Histogram of the field :
ggplot(corr_data, aes(x=LotArea)) + geom_histogram(color="blue", fill="white", binwidth = 1000)+labs(title="Lot Area plot - Skewness = 12.2 - Min Value = 1300",x="Lot Size", y = "Count")
The Fitting
Then load the MASS package and run fitdistr to fit an exponential probability density function
la_fit <- corr_data$LotArea
summary(la_fit) ### Note that the minimum value is > 0## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1300 7554 9478 10517 11602 215245
### Determining the fit
fit <- fitdistr(la_fit, "exponential")
fit## rate
## 9.508570e-05
## (2.488507e-06)
Find the optimal value of λ for this distribution, and then take 1000 samples from this exponential distribution using this value
# Computing Lambda
lambda_fit <- fit$estimate
lambda_fit## rate
## 9.50857e-05
### Generating new distribution and Histogram
new_dist <- rexp(1000, lambda_fit)
summary(new_dist)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.85 2935.45 7905.62 10876.57 15249.17 81694.21
hist(new_dist,breaks = 100)
Plot histogram and compare it with original histogram
Note: As shown below, using the Lambda from the “LotArea” variable and applying it a new Exponential distribution yields a similar histogram, the differences in distribution values has some effect on the resulting histogram but not significant.
fit_df <- data.frame(length = la_fit)
new_dist_df <- data.frame(length = new_dist)
fit_df$from <- 'Fit'
new_dist_df$from <- 'New Dist'
both_df <- rbind(fit_df,new_dist_df)
ggplot(both_df, aes(length, fill = from)) + geom_density(alpha = 0.5)
Using the exponential pdf, find the 5th and 95th percentiles using the cumulative distribution function (CDF). Also generate a 95% confidence interval from the empirical data, assuming normality. Finally, provide the empirical 5th percentile and 95th percentile of the data. Discuss.
The Exponential PDF is given as \(f(x;\lambda) = \lambda e^{-\lambda x}\) for \(x \geq 0\)
The CDF is given as \(f(x;\lambda)=1−e^{-\lambda x}\)
\(\lambda\) is given as : 9.50857
To find the \(5^{th}\) percentile we solve for x in :
\(0.05 = 1 - e^{\lambda x}\)
\(\implies\) \(0.05 = 1 - e^{-\lambda x}\)
\(\implies\) \(-ln(0.95) = \lambda x\)
\(\implies\) $ x = $
To find the \(95^{th}\) percentile we solve for x in :
\(0.95 = 1 - e^{\lambda x}\)
\(\implies\) \(0.95 = 1 - e^{-\lambda x}\)
\(\implies\) \(-ln(0.05) = \lambda x\)
\(\implies\) $ x = $
percent_5th <- round((-log(0.95)/lambda_fit),4)
cat("The 5th Percentile is given as : ", "\n", (percent_5th))## The 5th Percentile is given as :
## 539.4428
percent_95th <- round((-log(0.05)/lambda_fit),4)
cat("The 95th Percentile is given as : ", "\n", (percent_95th))## The 95th Percentile is given as :
## 31505.6
## To Compute 95% confidence interval from the empirical data
mean_la_fit <-mean(la_fit)
p2_norm<-rnorm(length(la_fit),mean(la_fit),sd(la_fit))
cat("The 95th confidence interval from the data is given as : ", "\n",(quantile(p2_norm, probs=c(0.05, 0.95))))## The 95th confidence interval from the data is given as :
## -5525.364 27175.39
# The Histogram of the distribution is :
hist(p2_norm)
## The empirical 5th percentile and 95th percentile of the data is given as :
quantile(la_fit, c(0.05, 0.95))## 5% 95%
## 3311.70 17401.15
Modeling
Build some type of multiple regression model and submit your model to the competition board. Provide your complete model summary and results with analysis
## Overview of datasets including the addition converted categorical columns.
# Printing the enhanced data frame with new quantitative columns
# ------ Training Dataset
p2_train %>% select(order(colnames(p2_train)))str(p2_train)## 'data.frame': 1460 obs. of 88 variables:
## $ Id : int 1 2 3 4 5 6 7 8 9 10 ...
## $ MSSubClass : int 60 20 60 70 60 50 20 60 50 190 ...
## $ MSZoning : chr "RL" "RL" "RL" "RL" ...
## $ LotFrontage : num 65 80 68 60 84 85 75 0 51 50 ...
## $ LotArea : int 8450 9600 11250 9550 14260 14115 10084 10382 6120 7420 ...
## $ Street : chr "Pave" "Pave" "Pave" "Pave" ...
## $ Alley : chr "0" "0" "0" "0" ...
## $ LotShape : chr "Reg" "Reg" "IR1" "IR1" ...
## $ LandContour : chr "Lvl" "Lvl" "Lvl" "Lvl" ...
## $ Utilities : chr "AllPub" "AllPub" "AllPub" "AllPub" ...
## $ LotConfig : chr "Inside" "FR2" "Inside" "Corner" ...
## $ LandSlope : chr "Gtl" "Gtl" "Gtl" "Gtl" ...
## $ Neighborhood : chr "CollgCr" "Veenker" "CollgCr" "Crawfor" ...
## $ Condition1 : chr "Norm" "Feedr" "Norm" "Norm" ...
## $ Condition2 : chr "Norm" "Norm" "Norm" "Norm" ...
## $ BldgType : chr "1Fam" "1Fam" "1Fam" "1Fam" ...
## $ HouseStyle : chr "2Story" "1Story" "2Story" "2Story" ...
## $ OverallQual : int 7 6 7 7 8 5 8 7 7 5 ...
## $ OverallCond : int 5 8 5 5 5 5 5 6 5 6 ...
## $ YearBuilt : int 2003 1976 2001 1915 2000 1993 2004 1973 1931 1939 ...
## $ YearRemodAdd : int 2003 1976 2002 1970 2000 1995 2005 1973 1950 1950 ...
## $ RoofStyle : chr "Gable" "Gable" "Gable" "Gable" ...
## $ RoofMatl : chr "CompShg" "CompShg" "CompShg" "CompShg" ...
## $ Exterior1st : chr "VinylSd" "MetalSd" "VinylSd" "Wd Sdng" ...
## $ Exterior2nd : chr "VinylSd" "MetalSd" "VinylSd" "Wd Shng" ...
## $ MasVnrType : chr "BrkFace" "None" "BrkFace" "None" ...
## $ MasVnrArea : num 196 0 162 0 350 0 186 240 0 0 ...
## $ ExterQual : chr "Gd" "TA" "Gd" "TA" ...
## $ ExterCond : chr "TA" "TA" "TA" "TA" ...
## $ Foundation : chr "PConc" "CBlock" "PConc" "BrkTil" ...
## $ BsmtQual : chr "Gd" "Gd" "Gd" "TA" ...
## $ BsmtCond : chr "TA" "TA" "TA" "Gd" ...
## $ BsmtExposure : chr "No" "Gd" "Mn" "No" ...
## $ BsmtFinType1 : chr "GLQ" "ALQ" "GLQ" "ALQ" ...
## $ BsmtFinSF1 : int 706 978 486 216 655 732 1369 859 0 851 ...
## $ BsmtFinType2 : chr "Unf" "Unf" "Unf" "Unf" ...
## $ BsmtFinSF2 : int 0 0 0 0 0 0 0 32 0 0 ...
## $ BsmtUnfSF : int 150 284 434 540 490 64 317 216 952 140 ...
## $ TotalBsmtSF : int 856 1262 920 756 1145 796 1686 1107 952 991 ...
## $ Heating : chr "GasA" "GasA" "GasA" "GasA" ...
## $ HeatingQC : chr "Ex" "Ex" "Ex" "Gd" ...
## $ CentralAir : chr "Y" "Y" "Y" "Y" ...
## $ Electrical : chr "SBrkr" "SBrkr" "SBrkr" "SBrkr" ...
## $ X1stFlrSF : int 856 1262 920 961 1145 796 1694 1107 1022 1077 ...
## $ X2ndFlrSF : int 854 0 866 756 1053 566 0 983 752 0 ...
## $ LowQualFinSF : int 0 0 0 0 0 0 0 0 0 0 ...
## $ GrLivArea : int 1710 1262 1786 1717 2198 1362 1694 2090 1774 1077 ...
## $ BsmtFullBath : int 1 0 1 1 1 1 1 1 0 1 ...
## $ BsmtHalfBath : int 0 1 0 0 0 0 0 0 0 0 ...
## $ FullBath : int 2 2 2 1 2 1 2 2 2 1 ...
## $ HalfBath : int 1 0 1 0 1 1 0 1 0 0 ...
## $ BedroomAbvGr : int 3 3 3 3 4 1 3 3 2 2 ...
## $ KitchenAbvGr : int 1 1 1 1 1 1 1 1 2 2 ...
## $ KitchenQual : chr "Gd" "TA" "Gd" "Gd" ...
## $ TotRmsAbvGrd : int 8 6 6 7 9 5 7 7 8 5 ...
## $ Functional : chr "Typ" "Typ" "Typ" "Typ" ...
## $ Fireplaces : int 0 1 1 1 1 0 1 2 2 2 ...
## $ FireplaceQu : chr "0" "TA" "TA" "Gd" ...
## $ GarageType : chr "Attchd" "Attchd" "Attchd" "Detchd" ...
## $ GarageYrBlt : num 2003 1976 2001 1998 2000 ...
## $ GarageFinish : chr "RFn" "RFn" "RFn" "Unf" ...
## $ GarageCars : int 2 2 2 3 3 2 2 2 2 1 ...
## $ GarageArea : int 548 460 608 642 836 480 636 484 468 205 ...
## $ GarageQual : chr "TA" "TA" "TA" "TA" ...
## $ GarageCond : chr "TA" "TA" "TA" "TA" ...
## $ PavedDrive : chr "Y" "Y" "Y" "Y" ...
## $ WoodDeckSF : int 0 298 0 0 192 40 255 235 90 0 ...
## $ OpenPorchSF : int 61 0 42 35 84 30 57 204 0 4 ...
## $ EnclosedPorch : int 0 0 0 272 0 0 0 228 205 0 ...
## $ X3SsnPorch : int 0 0 0 0 0 320 0 0 0 0 ...
## $ ScreenPorch : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PoolArea : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PoolQC : chr "0" "0" "0" "0" ...
## $ Fence : chr "0" "0" "0" "0" ...
## $ MiscFeature : chr "0" "0" "0" "0" ...
## $ MiscVal : int 0 0 0 0 0 700 0 350 0 0 ...
## $ MoSold : int 2 5 9 2 12 10 8 11 4 1 ...
## $ YrSold : int 2008 2007 2008 2006 2008 2009 2007 2009 2008 2008 ...
## $ SaleType : chr "WD" "WD" "WD" "WD" ...
## $ SaleCondition : chr "Normal" "Normal" "Normal" "Abnorml" ...
## $ SalePrice : int 208500 181500 223500 140000 250000 143000 307000 200000 129900 118000 ...
## $ Foundation_q : num 4 5 4 6 4 1 4 5 6 6 ...
## $ BsmtFinType2_q: num 1 1 1 1 1 1 1 4 1 1 ...
## $ HeatingQC_q : num 5 5 5 4 5 5 5 5 4 5 ...
## $ Electrical_q : num 5 5 5 5 5 5 5 5 3 5 ...
## $ KitchenQual_q : num 4 3 4 4 4 3 4 3 3 3 ...
## $ GarageCond_q : num 3 3 3 3 3 3 3 3 3 3 ...
## $ Fence_q : num 0 0 0 0 0 3 0 0 0 0 ...
kable(head(p2_train))| Id | MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | OverallQual | OverallCond | YearBuilt | YearRemodAdd | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | MasVnrArea | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinSF1 | BsmtFinType2 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | Heating | HeatingQC | CentralAir | Electrical | X1stFlrSF | X2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath | FullBath | HalfBath | BedroomAbvGr | KitchenAbvGr | KitchenQual | TotRmsAbvGrd | Functional | Fireplaces | FireplaceQu | GarageType | GarageYrBlt | GarageFinish | GarageCars | GarageArea | GarageQual | GarageCond | PavedDrive | WoodDeckSF | OpenPorchSF | EnclosedPorch | X3SsnPorch | ScreenPorch | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | SalePrice | Foundation_q | BsmtFinType2_q | HeatingQC_q | Electrical_q | KitchenQual_q | GarageCond_q | Fence_q |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 60 | RL | 65 | 8450 | Pave | 0 | Reg | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | 7 | 5 | 2003 | 2003 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 196 | Gd | TA | PConc | Gd | TA | No | GLQ | 706 | Unf | 0 | 150 | 856 | GasA | Ex | Y | SBrkr | 856 | 854 | 0 | 1710 | 1 | 0 | 2 | 1 | 3 | 1 | Gd | 8 | Typ | 0 | 0 | Attchd | 2003 | RFn | 2 | 548 | TA | TA | Y | 0 | 61 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2008 | WD | Normal | 208500 | 4 | 1 | 5 | 5 | 4 | 3 | 0 |
| 2 | 20 | RL | 80 | 9600 | Pave | 0 | Reg | Lvl | AllPub | FR2 | Gtl | Veenker | Feedr | Norm | 1Fam | 1Story | 6 | 8 | 1976 | 1976 | Gable | CompShg | MetalSd | MetalSd | None | 0 | TA | TA | CBlock | Gd | TA | Gd | ALQ | 978 | Unf | 0 | 284 | 1262 | GasA | Ex | Y | SBrkr | 1262 | 0 | 0 | 1262 | 0 | 1 | 2 | 0 | 3 | 1 | TA | 6 | Typ | 1 | TA | Attchd | 1976 | RFn | 2 | 460 | TA | TA | Y | 298 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 2007 | WD | Normal | 181500 | 5 | 1 | 5 | 5 | 3 | 3 | 0 |
| 3 | 60 | RL | 68 | 11250 | Pave | 0 | IR1 | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | 7 | 5 | 2001 | 2002 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 162 | Gd | TA | PConc | Gd | TA | Mn | GLQ | 486 | Unf | 0 | 434 | 920 | GasA | Ex | Y | SBrkr | 920 | 866 | 0 | 1786 | 1 | 0 | 2 | 1 | 3 | 1 | Gd | 6 | Typ | 1 | TA | Attchd | 2001 | RFn | 2 | 608 | TA | TA | Y | 0 | 42 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 9 | 2008 | WD | Normal | 223500 | 4 | 1 | 5 | 5 | 4 | 3 | 0 |
| 4 | 70 | RL | 60 | 9550 | Pave | 0 | IR1 | Lvl | AllPub | Corner | Gtl | Crawfor | Norm | Norm | 1Fam | 2Story | 7 | 5 | 1915 | 1970 | Gable | CompShg | Wd Sdng | Wd Shng | None | 0 | TA | TA | BrkTil | TA | Gd | No | ALQ | 216 | Unf | 0 | 540 | 756 | GasA | Gd | Y | SBrkr | 961 | 756 | 0 | 1717 | 1 | 0 | 1 | 0 | 3 | 1 | Gd | 7 | Typ | 1 | Gd | Detchd | 1998 | Unf | 3 | 642 | TA | TA | Y | 0 | 35 | 272 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2006 | WD | Abnorml | 140000 | 6 | 1 | 4 | 5 | 4 | 3 | 0 |
| 5 | 60 | RL | 84 | 14260 | Pave | 0 | IR1 | Lvl | AllPub | FR2 | Gtl | NoRidge | Norm | Norm | 1Fam | 2Story | 8 | 5 | 2000 | 2000 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 350 | Gd | TA | PConc | Gd | TA | Av | GLQ | 655 | Unf | 0 | 490 | 1145 | GasA | Ex | Y | SBrkr | 1145 | 1053 | 0 | 2198 | 1 | 0 | 2 | 1 | 4 | 1 | Gd | 9 | Typ | 1 | TA | Attchd | 2000 | RFn | 3 | 836 | TA | TA | Y | 192 | 84 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 12 | 2008 | WD | Normal | 250000 | 4 | 1 | 5 | 5 | 4 | 3 | 0 |
| 6 | 50 | RL | 85 | 14115 | Pave | 0 | IR1 | Lvl | AllPub | Inside | Gtl | Mitchel | Norm | Norm | 1Fam | 1.5Fin | 5 | 5 | 1993 | 1995 | Gable | CompShg | VinylSd | VinylSd | None | 0 | TA | TA | Wood | Gd | TA | No | GLQ | 732 | Unf | 0 | 64 | 796 | GasA | Ex | Y | SBrkr | 796 | 566 | 0 | 1362 | 1 | 0 | 1 | 1 | 1 | 1 | TA | 5 | Typ | 0 | 0 | Attchd | 1993 | Unf | 2 | 480 | TA | TA | Y | 40 | 30 | 0 | 320 | 0 | 0 | 0 | MnPrv | Shed | 700 | 10 | 2009 | WD | Normal | 143000 | 1 | 1 | 5 | 5 | 3 | 3 | 3 |
# ------ Test Dataset
p2_test %>% select(order(colnames(p2_test)))str(p2_test)## 'data.frame': 1459 obs. of 88 variables:
## $ Id : int 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 ...
## $ MSSubClass : int 20 20 60 60 120 60 20 60 20 20 ...
## $ MSZoning : chr "RH" "RL" "RL" "RL" ...
## $ LotFrontage : num 80 81 74 78 43 75 0 63 85 70 ...
## $ LotArea : int 11622 14267 13830 9978 5005 10000 7980 8402 10176 8400 ...
## $ Street : chr "Pave" "Pave" "Pave" "Pave" ...
## $ Alley : chr "0" "0" "0" "0" ...
## $ LotShape : chr "Reg" "IR1" "IR1" "IR1" ...
## $ LandContour : chr "Lvl" "Lvl" "Lvl" "Lvl" ...
## $ Utilities : chr "AllPub" "AllPub" "AllPub" "AllPub" ...
## $ LotConfig : chr "Inside" "Corner" "Inside" "Inside" ...
## $ LandSlope : chr "Gtl" "Gtl" "Gtl" "Gtl" ...
## $ Neighborhood : chr "NAmes" "NAmes" "Gilbert" "Gilbert" ...
## $ Condition1 : chr "Feedr" "Norm" "Norm" "Norm" ...
## $ Condition2 : chr "Norm" "Norm" "Norm" "Norm" ...
## $ BldgType : chr "1Fam" "1Fam" "1Fam" "1Fam" ...
## $ HouseStyle : chr "1Story" "1Story" "2Story" "2Story" ...
## $ OverallQual : int 5 6 5 6 8 6 6 6 7 4 ...
## $ OverallCond : int 6 6 5 6 5 5 7 5 5 5 ...
## $ YearBuilt : int 1961 1958 1997 1998 1992 1993 1992 1998 1990 1970 ...
## $ YearRemodAdd : int 1961 1958 1998 1998 1992 1994 2007 1998 1990 1970 ...
## $ RoofStyle : chr "Gable" "Hip" "Gable" "Gable" ...
## $ RoofMatl : chr "CompShg" "CompShg" "CompShg" "CompShg" ...
## $ Exterior1st : chr "VinylSd" "Wd Sdng" "VinylSd" "VinylSd" ...
## $ Exterior2nd : chr "VinylSd" "Wd Sdng" "VinylSd" "VinylSd" ...
## $ MasVnrType : chr "None" "BrkFace" "None" "BrkFace" ...
## $ MasVnrArea : num 0 108 0 20 0 0 0 0 0 0 ...
## $ ExterQual : chr "TA" "TA" "TA" "TA" ...
## $ ExterCond : chr "TA" "TA" "TA" "TA" ...
## $ Foundation : chr "CBlock" "CBlock" "PConc" "PConc" ...
## $ BsmtQual : chr "TA" "TA" "Gd" "TA" ...
## $ BsmtCond : chr "TA" "TA" "TA" "TA" ...
## $ BsmtExposure : chr "No" "No" "No" "No" ...
## $ BsmtFinType1 : chr "Rec" "ALQ" "GLQ" "GLQ" ...
## $ BsmtFinSF1 : num 468 923 791 602 263 0 935 0 637 804 ...
## $ BsmtFinType2 : chr "LwQ" "Unf" "Unf" "Unf" ...
## $ BsmtFinSF2 : num 144 0 0 0 0 0 0 0 0 78 ...
## $ BsmtUnfSF : num 270 406 137 324 1017 ...
## $ TotalBsmtSF : num 882 1329 928 926 1280 ...
## $ Heating : chr "GasA" "GasA" "GasA" "GasA" ...
## $ HeatingQC : chr "TA" "TA" "Gd" "Ex" ...
## $ CentralAir : chr "Y" "Y" "Y" "Y" ...
## $ Electrical : chr "SBrkr" "SBrkr" "SBrkr" "SBrkr" ...
## $ X1stFlrSF : int 896 1329 928 926 1280 763 1187 789 1341 882 ...
## $ X2ndFlrSF : int 0 0 701 678 0 892 0 676 0 0 ...
## $ LowQualFinSF : int 0 0 0 0 0 0 0 0 0 0 ...
## $ GrLivArea : int 896 1329 1629 1604 1280 1655 1187 1465 1341 882 ...
## $ BsmtFullBath : num 0 0 0 0 0 0 1 0 1 1 ...
## $ BsmtHalfBath : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FullBath : int 1 1 2 2 2 2 2 2 1 1 ...
## $ HalfBath : int 0 1 1 1 0 1 0 1 1 0 ...
## $ BedroomAbvGr : int 2 3 3 3 2 3 3 3 2 2 ...
## $ KitchenAbvGr : int 1 1 1 1 1 1 1 1 1 1 ...
## $ KitchenQual : chr "TA" "Gd" "TA" "Gd" ...
## $ TotRmsAbvGrd : int 5 6 6 7 5 7 6 7 5 4 ...
## $ Functional : chr "Typ" "Typ" "Typ" "Typ" ...
## $ Fireplaces : int 0 0 1 1 0 1 0 1 1 0 ...
## $ FireplaceQu : chr "0" "0" "TA" "Gd" ...
## $ GarageType : chr "Attchd" "Attchd" "Attchd" "Attchd" ...
## $ GarageYrBlt : num 1961 1958 1997 1998 1992 ...
## $ GarageFinish : chr "Unf" "Unf" "Fin" "Fin" ...
## $ GarageCars : num 1 1 2 2 2 2 2 2 2 2 ...
## $ GarageArea : num 730 312 482 470 506 440 420 393 506 525 ...
## $ GarageQual : chr "TA" "TA" "TA" "TA" ...
## $ GarageCond : chr "TA" "TA" "TA" "TA" ...
## $ PavedDrive : chr "Y" "Y" "Y" "Y" ...
## $ WoodDeckSF : int 140 393 212 360 0 157 483 0 192 240 ...
## $ OpenPorchSF : int 0 36 34 36 82 84 21 75 0 0 ...
## $ EnclosedPorch : int 0 0 0 0 0 0 0 0 0 0 ...
## $ X3SsnPorch : int 0 0 0 0 0 0 0 0 0 0 ...
## $ ScreenPorch : int 120 0 0 0 144 0 0 0 0 0 ...
## $ PoolArea : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PoolQC : chr "0" "0" "0" "0" ...
## $ Fence : chr "MnPrv" "0" "MnPrv" "0" ...
## $ MiscFeature : chr "0" "Gar2" "0" "0" ...
## $ MiscVal : int 0 12500 0 0 0 0 500 0 0 0 ...
## $ MoSold : int 6 6 3 6 1 4 3 5 2 4 ...
## $ YrSold : int 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
## $ SaleType : chr "WD" "WD" "WD" "WD" ...
## $ SaleCondition : chr "Normal" "Normal" "Normal" "Normal" ...
## $ SalePrice : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Foundation_q : num 5 5 4 4 4 4 4 4 4 5 ...
## $ BsmtFinType2_q: num 2 1 1 1 1 1 1 1 1 3 ...
## $ HeatingQC_q : num 3 3 4 5 5 4 5 4 4 3 ...
## $ Electrical_q : num 5 5 5 5 5 5 5 5 5 5 ...
## $ KitchenQual_q : num 3 4 3 4 4 3 3 3 4 3 ...
## $ GarageCond_q : num 3 3 3 3 3 3 3 3 3 3 ...
## $ Fence_q : num 3 0 3 0 0 0 4 0 0 3 ...
kable(head(p2_test))| Id | MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | OverallQual | OverallCond | YearBuilt | YearRemodAdd | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | MasVnrArea | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinSF1 | BsmtFinType2 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | Heating | HeatingQC | CentralAir | Electrical | X1stFlrSF | X2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath | FullBath | HalfBath | BedroomAbvGr | KitchenAbvGr | KitchenQual | TotRmsAbvGrd | Functional | Fireplaces | FireplaceQu | GarageType | GarageYrBlt | GarageFinish | GarageCars | GarageArea | GarageQual | GarageCond | PavedDrive | WoodDeckSF | OpenPorchSF | EnclosedPorch | X3SsnPorch | ScreenPorch | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | SalePrice | Foundation_q | BsmtFinType2_q | HeatingQC_q | Electrical_q | KitchenQual_q | GarageCond_q | Fence_q |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1461 | 20 | RH | 80 | 11622 | Pave | 0 | Reg | Lvl | AllPub | Inside | Gtl | NAmes | Feedr | Norm | 1Fam | 1Story | 5 | 6 | 1961 | 1961 | Gable | CompShg | VinylSd | VinylSd | None | 0 | TA | TA | CBlock | TA | TA | No | Rec | 468 | LwQ | 144 | 270 | 882 | GasA | TA | Y | SBrkr | 896 | 0 | 0 | 896 | 0 | 0 | 1 | 0 | 2 | 1 | TA | 5 | Typ | 0 | 0 | Attchd | 1961 | Unf | 1 | 730 | TA | TA | Y | 140 | 0 | 0 | 0 | 120 | 0 | 0 | MnPrv | 0 | 0 | 6 | 2010 | WD | Normal | 0 | 5 | 2 | 3 | 5 | 3 | 3 | 3 |
| 1462 | 20 | RL | 81 | 14267 | Pave | 0 | IR1 | Lvl | AllPub | Corner | Gtl | NAmes | Norm | Norm | 1Fam | 1Story | 6 | 6 | 1958 | 1958 | Hip | CompShg | Wd Sdng | Wd Sdng | BrkFace | 108 | TA | TA | CBlock | TA | TA | No | ALQ | 923 | Unf | 0 | 406 | 1329 | GasA | TA | Y | SBrkr | 1329 | 0 | 0 | 1329 | 0 | 0 | 1 | 1 | 3 | 1 | Gd | 6 | Typ | 0 | 0 | Attchd | 1958 | Unf | 1 | 312 | TA | TA | Y | 393 | 36 | 0 | 0 | 0 | 0 | 0 | 0 | Gar2 | 12500 | 6 | 2010 | WD | Normal | 0 | 5 | 1 | 3 | 5 | 4 | 3 | 0 |
| 1463 | 60 | RL | 74 | 13830 | Pave | 0 | IR1 | Lvl | AllPub | Inside | Gtl | Gilbert | Norm | Norm | 1Fam | 2Story | 5 | 5 | 1997 | 1998 | Gable | CompShg | VinylSd | VinylSd | None | 0 | TA | TA | PConc | Gd | TA | No | GLQ | 791 | Unf | 0 | 137 | 928 | GasA | Gd | Y | SBrkr | 928 | 701 | 0 | 1629 | 0 | 0 | 2 | 1 | 3 | 1 | TA | 6 | Typ | 1 | TA | Attchd | 1997 | Fin | 2 | 482 | TA | TA | Y | 212 | 34 | 0 | 0 | 0 | 0 | 0 | MnPrv | 0 | 0 | 3 | 2010 | WD | Normal | 0 | 4 | 1 | 4 | 5 | 3 | 3 | 3 |
| 1464 | 60 | RL | 78 | 9978 | Pave | 0 | IR1 | Lvl | AllPub | Inside | Gtl | Gilbert | Norm | Norm | 1Fam | 2Story | 6 | 6 | 1998 | 1998 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 20 | TA | TA | PConc | TA | TA | No | GLQ | 602 | Unf | 0 | 324 | 926 | GasA | Ex | Y | SBrkr | 926 | 678 | 0 | 1604 | 0 | 0 | 2 | 1 | 3 | 1 | Gd | 7 | Typ | 1 | Gd | Attchd | 1998 | Fin | 2 | 470 | TA | TA | Y | 360 | 36 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 2010 | WD | Normal | 0 | 4 | 1 | 5 | 5 | 4 | 3 | 0 |
| 1465 | 120 | RL | 43 | 5005 | Pave | 0 | IR1 | HLS | AllPub | Inside | Gtl | StoneBr | Norm | Norm | TwnhsE | 1Story | 8 | 5 | 1992 | 1992 | Gable | CompShg | HdBoard | HdBoard | None | 0 | Gd | TA | PConc | Gd | TA | No | ALQ | 263 | Unf | 0 | 1017 | 1280 | GasA | Ex | Y | SBrkr | 1280 | 0 | 0 | 1280 | 0 | 0 | 2 | 0 | 2 | 1 | Gd | 5 | Typ | 0 | 0 | Attchd | 1992 | RFn | 2 | 506 | TA | TA | Y | 0 | 82 | 0 | 0 | 144 | 0 | 0 | 0 | 0 | 0 | 1 | 2010 | WD | Normal | 0 | 4 | 1 | 5 | 5 | 4 | 3 | 0 |
| 1466 | 60 | RL | 75 | 10000 | Pave | 0 | IR1 | Lvl | AllPub | Corner | Gtl | Gilbert | Norm | Norm | 1Fam | 2Story | 6 | 5 | 1993 | 1994 | Gable | CompShg | HdBoard | HdBoard | None | 0 | TA | TA | PConc | Gd | TA | No | Unf | 0 | Unf | 0 | 763 | 763 | GasA | Gd | Y | SBrkr | 763 | 892 | 0 | 1655 | 0 | 0 | 2 | 1 | 3 | 1 | TA | 7 | Typ | 1 | TA | Attchd | 1993 | Fin | 2 | 440 | TA | TA | Y | 157 | 84 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 2010 | WD | Normal | 0 | 4 | 1 | 4 | 5 | 3 | 3 | 0 |
Preparing the Training dataset by removing the non-numerical columns
# selection columns that are numeric only
p2_train_num <- p2_train %>%
dplyr::select_if(is.numeric)
# Dropping the "id" and "SalePrice" fields since it is not needed for the predictor model Variables
p2_train_vars <- subset(p2_train_num, select = -c(Id,SalePrice))
# Check for missing values in data
colSums(is.na(p2_train_vars))## MSSubClass LotFrontage LotArea OverallQual OverallCond
## 0 0 0 0 0
## YearBuilt YearRemodAdd MasVnrArea BsmtFinSF1 BsmtFinSF2
## 0 0 0 0 0
## BsmtUnfSF TotalBsmtSF X1stFlrSF X2ndFlrSF LowQualFinSF
## 0 0 0 0 0
## GrLivArea BsmtFullBath BsmtHalfBath FullBath HalfBath
## 0 0 0 0 0
## BedroomAbvGr KitchenAbvGr TotRmsAbvGrd Fireplaces GarageYrBlt
## 0 0 0 0 0
## GarageCars GarageArea WoodDeckSF OpenPorchSF EnclosedPorch
## 0 0 0 0 0
## X3SsnPorch ScreenPorch PoolArea MiscVal MoSold
## 0 0 0 0 0
## YrSold Foundation_q BsmtFinType2_q HeatingQC_q Electrical_q
## 0 0 0 0 0
## KitchenQual_q GarageCond_q Fence_q
## 0 0 0
## Reviewing the structure of the enhanced dataset
str(p2_train_vars)## 'data.frame': 1460 obs. of 43 variables:
## $ MSSubClass : int 60 20 60 70 60 50 20 60 50 190 ...
## $ LotFrontage : num 65 80 68 60 84 85 75 0 51 50 ...
## $ LotArea : int 8450 9600 11250 9550 14260 14115 10084 10382 6120 7420 ...
## $ OverallQual : int 7 6 7 7 8 5 8 7 7 5 ...
## $ OverallCond : int 5 8 5 5 5 5 5 6 5 6 ...
## $ YearBuilt : int 2003 1976 2001 1915 2000 1993 2004 1973 1931 1939 ...
## $ YearRemodAdd : int 2003 1976 2002 1970 2000 1995 2005 1973 1950 1950 ...
## $ MasVnrArea : num 196 0 162 0 350 0 186 240 0 0 ...
## $ BsmtFinSF1 : int 706 978 486 216 655 732 1369 859 0 851 ...
## $ BsmtFinSF2 : int 0 0 0 0 0 0 0 32 0 0 ...
## $ BsmtUnfSF : int 150 284 434 540 490 64 317 216 952 140 ...
## $ TotalBsmtSF : int 856 1262 920 756 1145 796 1686 1107 952 991 ...
## $ X1stFlrSF : int 856 1262 920 961 1145 796 1694 1107 1022 1077 ...
## $ X2ndFlrSF : int 854 0 866 756 1053 566 0 983 752 0 ...
## $ LowQualFinSF : int 0 0 0 0 0 0 0 0 0 0 ...
## $ GrLivArea : int 1710 1262 1786 1717 2198 1362 1694 2090 1774 1077 ...
## $ BsmtFullBath : int 1 0 1 1 1 1 1 1 0 1 ...
## $ BsmtHalfBath : int 0 1 0 0 0 0 0 0 0 0 ...
## $ FullBath : int 2 2 2 1 2 1 2 2 2 1 ...
## $ HalfBath : int 1 0 1 0 1 1 0 1 0 0 ...
## $ BedroomAbvGr : int 3 3 3 3 4 1 3 3 2 2 ...
## $ KitchenAbvGr : int 1 1 1 1 1 1 1 1 2 2 ...
## $ TotRmsAbvGrd : int 8 6 6 7 9 5 7 7 8 5 ...
## $ Fireplaces : int 0 1 1 1 1 0 1 2 2 2 ...
## $ GarageYrBlt : num 2003 1976 2001 1998 2000 ...
## $ GarageCars : int 2 2 2 3 3 2 2 2 2 1 ...
## $ GarageArea : int 548 460 608 642 836 480 636 484 468 205 ...
## $ WoodDeckSF : int 0 298 0 0 192 40 255 235 90 0 ...
## $ OpenPorchSF : int 61 0 42 35 84 30 57 204 0 4 ...
## $ EnclosedPorch : int 0 0 0 272 0 0 0 228 205 0 ...
## $ X3SsnPorch : int 0 0 0 0 0 320 0 0 0 0 ...
## $ ScreenPorch : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PoolArea : int 0 0 0 0 0 0 0 0 0 0 ...
## $ MiscVal : int 0 0 0 0 0 700 0 350 0 0 ...
## $ MoSold : int 2 5 9 2 12 10 8 11 4 1 ...
## $ YrSold : int 2008 2007 2008 2006 2008 2009 2007 2009 2008 2008 ...
## $ Foundation_q : num 4 5 4 6 4 1 4 5 6 6 ...
## $ BsmtFinType2_q: num 1 1 1 1 1 1 1 4 1 1 ...
## $ HeatingQC_q : num 5 5 5 4 5 5 5 5 4 5 ...
## $ Electrical_q : num 5 5 5 5 5 5 5 5 3 5 ...
## $ KitchenQual_q : num 4 3 4 4 4 3 4 3 3 3 ...
## $ GarageCond_q : num 3 3 3 3 3 3 3 3 3 3 ...
## $ Fence_q : num 0 0 0 0 0 3 0 0 0 0 ...
dim(p2_train_vars)## [1] 1460 43
kable(head(p2_train_vars))| MSSubClass | LotFrontage | LotArea | OverallQual | OverallCond | YearBuilt | YearRemodAdd | MasVnrArea | BsmtFinSF1 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | X1stFlrSF | X2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath | FullBath | HalfBath | BedroomAbvGr | KitchenAbvGr | TotRmsAbvGrd | Fireplaces | GarageYrBlt | GarageCars | GarageArea | WoodDeckSF | OpenPorchSF | EnclosedPorch | X3SsnPorch | ScreenPorch | PoolArea | MiscVal | MoSold | YrSold | Foundation_q | BsmtFinType2_q | HeatingQC_q | Electrical_q | KitchenQual_q | GarageCond_q | Fence_q |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 60 | 65 | 8450 | 7 | 5 | 2003 | 2003 | 196 | 706 | 0 | 150 | 856 | 856 | 854 | 0 | 1710 | 1 | 0 | 2 | 1 | 3 | 1 | 8 | 0 | 2003 | 2 | 548 | 0 | 61 | 0 | 0 | 0 | 0 | 0 | 2 | 2008 | 4 | 1 | 5 | 5 | 4 | 3 | 0 |
| 20 | 80 | 9600 | 6 | 8 | 1976 | 1976 | 0 | 978 | 0 | 284 | 1262 | 1262 | 0 | 0 | 1262 | 0 | 1 | 2 | 0 | 3 | 1 | 6 | 1 | 1976 | 2 | 460 | 298 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 2007 | 5 | 1 | 5 | 5 | 3 | 3 | 0 |
| 60 | 68 | 11250 | 7 | 5 | 2001 | 2002 | 162 | 486 | 0 | 434 | 920 | 920 | 866 | 0 | 1786 | 1 | 0 | 2 | 1 | 3 | 1 | 6 | 1 | 2001 | 2 | 608 | 0 | 42 | 0 | 0 | 0 | 0 | 0 | 9 | 2008 | 4 | 1 | 5 | 5 | 4 | 3 | 0 |
| 70 | 60 | 9550 | 7 | 5 | 1915 | 1970 | 0 | 216 | 0 | 540 | 756 | 961 | 756 | 0 | 1717 | 1 | 0 | 1 | 0 | 3 | 1 | 7 | 1 | 1998 | 3 | 642 | 0 | 35 | 272 | 0 | 0 | 0 | 0 | 2 | 2006 | 6 | 1 | 4 | 5 | 4 | 3 | 0 |
| 60 | 84 | 14260 | 8 | 5 | 2000 | 2000 | 350 | 655 | 0 | 490 | 1145 | 1145 | 1053 | 0 | 2198 | 1 | 0 | 2 | 1 | 4 | 1 | 9 | 1 | 2000 | 3 | 836 | 192 | 84 | 0 | 0 | 0 | 0 | 0 | 12 | 2008 | 4 | 1 | 5 | 5 | 4 | 3 | 0 |
| 50 | 85 | 14115 | 5 | 5 | 1993 | 1995 | 0 | 732 | 0 | 64 | 796 | 796 | 566 | 0 | 1362 | 1 | 0 | 1 | 1 | 1 | 1 | 5 | 0 | 1993 | 2 | 480 | 40 | 30 | 0 | 320 | 0 | 0 | 700 | 10 | 2009 | 1 | 1 | 5 | 5 | 3 | 3 | 3 |
Since our initial data preparation yielded 43 numerical variables that are eligible to be included in the linear model, we will use additional tools to narrow the selection to variables that will yield “best fit” results. The two computations to be employed are :
Test for Multicollinearity
Test for Correlation to the predicted variable - Sales Price -
## the summary of the dataset is :
summary(p2_train_vars)## MSSubClass LotFrontage LotArea OverallQual
## Min. : 20.0 Min. : 0.00 Min. : 1300 Min. : 1.000
## 1st Qu.: 20.0 1st Qu.: 42.00 1st Qu.: 7554 1st Qu.: 5.000
## Median : 50.0 Median : 63.00 Median : 9478 Median : 6.000
## Mean : 56.9 Mean : 57.62 Mean : 10517 Mean : 6.099
## 3rd Qu.: 70.0 3rd Qu.: 79.00 3rd Qu.: 11602 3rd Qu.: 7.000
## Max. :190.0 Max. :313.00 Max. :215245 Max. :10.000
## OverallCond YearBuilt YearRemodAdd MasVnrArea
## Min. :1.000 Min. :1872 Min. :1950 Min. : 0.0
## 1st Qu.:5.000 1st Qu.:1954 1st Qu.:1967 1st Qu.: 0.0
## Median :5.000 Median :1973 Median :1994 Median : 0.0
## Mean :5.575 Mean :1971 Mean :1985 Mean : 103.1
## 3rd Qu.:6.000 3rd Qu.:2000 3rd Qu.:2004 3rd Qu.: 164.2
## Max. :9.000 Max. :2010 Max. :2010 Max. :1600.0
## BsmtFinSF1 BsmtFinSF2 BsmtUnfSF TotalBsmtSF
## Min. : 0.0 Min. : 0.00 Min. : 0.0 Min. : 0.0
## 1st Qu.: 0.0 1st Qu.: 0.00 1st Qu.: 223.0 1st Qu.: 795.8
## Median : 383.5 Median : 0.00 Median : 477.5 Median : 991.5
## Mean : 443.6 Mean : 46.55 Mean : 567.2 Mean :1057.4
## 3rd Qu.: 712.2 3rd Qu.: 0.00 3rd Qu.: 808.0 3rd Qu.:1298.2
## Max. :5644.0 Max. :1474.00 Max. :2336.0 Max. :6110.0
## X1stFlrSF X2ndFlrSF LowQualFinSF GrLivArea
## Min. : 334 Min. : 0 Min. : 0.000 Min. : 334
## 1st Qu.: 882 1st Qu.: 0 1st Qu.: 0.000 1st Qu.:1130
## Median :1087 Median : 0 Median : 0.000 Median :1464
## Mean :1163 Mean : 347 Mean : 5.845 Mean :1515
## 3rd Qu.:1391 3rd Qu.: 728 3rd Qu.: 0.000 3rd Qu.:1777
## Max. :4692 Max. :2065 Max. :572.000 Max. :5642
## BsmtFullBath BsmtHalfBath FullBath HalfBath
## Min. :0.0000 Min. :0.00000 Min. :0.000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:1.000 1st Qu.:0.0000
## Median :0.0000 Median :0.00000 Median :2.000 Median :0.0000
## Mean :0.4253 Mean :0.05753 Mean :1.565 Mean :0.3829
## 3rd Qu.:1.0000 3rd Qu.:0.00000 3rd Qu.:2.000 3rd Qu.:1.0000
## Max. :3.0000 Max. :2.00000 Max. :3.000 Max. :2.0000
## BedroomAbvGr KitchenAbvGr TotRmsAbvGrd Fireplaces
## Min. :0.000 Min. :0.000 Min. : 2.000 Min. :0.000
## 1st Qu.:2.000 1st Qu.:1.000 1st Qu.: 5.000 1st Qu.:0.000
## Median :3.000 Median :1.000 Median : 6.000 Median :1.000
## Mean :2.866 Mean :1.047 Mean : 6.518 Mean :0.613
## 3rd Qu.:3.000 3rd Qu.:1.000 3rd Qu.: 7.000 3rd Qu.:1.000
## Max. :8.000 Max. :3.000 Max. :14.000 Max. :3.000
## GarageYrBlt GarageCars GarageArea WoodDeckSF
## Min. : 0 Min. :0.000 Min. : 0.0 Min. : 0.00
## 1st Qu.:1958 1st Qu.:1.000 1st Qu.: 334.5 1st Qu.: 0.00
## Median :1977 Median :2.000 Median : 480.0 Median : 0.00
## Mean :1869 Mean :1.767 Mean : 473.0 Mean : 94.24
## 3rd Qu.:2001 3rd Qu.:2.000 3rd Qu.: 576.0 3rd Qu.:168.00
## Max. :2010 Max. :4.000 Max. :1418.0 Max. :857.00
## OpenPorchSF EnclosedPorch X3SsnPorch ScreenPorch
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 25.00 Median : 0.00 Median : 0.00 Median : 0.00
## Mean : 46.66 Mean : 21.95 Mean : 3.41 Mean : 15.06
## 3rd Qu.: 68.00 3rd Qu.: 0.00 3rd Qu.: 0.00 3rd Qu.: 0.00
## Max. :547.00 Max. :552.00 Max. :508.00 Max. :480.00
## PoolArea MiscVal MoSold YrSold
## Min. : 0.000 Min. : 0.00 Min. : 1.000 Min. :2006
## 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 5.000 1st Qu.:2007
## Median : 0.000 Median : 0.00 Median : 6.000 Median :2008
## Mean : 2.759 Mean : 43.49 Mean : 6.322 Mean :2008
## 3rd Qu.: 0.000 3rd Qu.: 0.00 3rd Qu.: 8.000 3rd Qu.:2009
## Max. :738.000 Max. :15500.00 Max. :12.000 Max. :2010
## Foundation_q BsmtFinType2_q HeatingQC_q Electrical_q
## Min. :1.000 Min. :0.000 Min. :1.000 Min. :0.000
## 1st Qu.:4.000 1st Qu.:1.000 1st Qu.:3.000 1st Qu.:5.000
## Median :5.000 Median :1.000 Median :5.000 Median :5.000
## Mean :4.603 Mean :1.247 Mean :4.145 Mean :4.886
## 3rd Qu.:5.000 3rd Qu.:1.000 3rd Qu.:5.000 3rd Qu.:5.000
## Max. :6.000 Max. :6.000 Max. :5.000 Max. :5.000
## KitchenQual_q GarageCond_q Fence_q
## Min. :2.000 Min. :0.000 Min. :0.0000
## 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:0.0000
## Median :3.000 Median :3.000 Median :0.0000
## Mean :3.512 Mean :2.809 Mean :0.5658
## 3rd Qu.:4.000 3rd Qu.:3.000 3rd Qu.:0.0000
## Max. :5.000 Max. :5.000 Max. :4.0000
## Checking for Null Values
p2_train_vars[!complete.cases(p2_train_vars),]## Predictor Variables included in initial regression :
sort(colnames(p2_train_vars))## [1] "BedroomAbvGr" "BsmtFinSF1" "BsmtFinSF2" "BsmtFinType2_q"
## [5] "BsmtFullBath" "BsmtHalfBath" "BsmtUnfSF" "Electrical_q"
## [9] "EnclosedPorch" "Fence_q" "Fireplaces" "Foundation_q"
## [13] "FullBath" "GarageArea" "GarageCars" "GarageCond_q"
## [17] "GarageYrBlt" "GrLivArea" "HalfBath" "HeatingQC_q"
## [21] "KitchenAbvGr" "KitchenQual_q" "LotArea" "LotFrontage"
## [25] "LowQualFinSF" "MasVnrArea" "MiscVal" "MoSold"
## [29] "MSSubClass" "OpenPorchSF" "OverallCond" "OverallQual"
## [33] "PoolArea" "ScreenPorch" "TotalBsmtSF" "TotRmsAbvGrd"
## [37] "WoodDeckSF" "X1stFlrSF" "X2ndFlrSF" "X3SsnPorch"
## [41] "YearBuilt" "YearRemodAdd" "YrSold"
Regression Model Fitting
We will perform Regression Modeling and manipulate predictor variables to compute the optimal outcome
Regression Modeling V1 and V2
Note that after computing Linear Model v2, we have Multiple - \(R^{2} = 0.823\) and \(R^{2} = 0.8183\)
##### Note that object "p2_train_vars" holds all predictor variables- columns
p2_train_regr_v1 <- as.formula(paste("SalePrice", "~",
paste(sort(colnames(p2_train_vars)), collapse = "+"),
sep = ""
))
## The Resultant variable list :
p2_train_regr_v1## SalePrice ~ BedroomAbvGr + BsmtFinSF1 + BsmtFinSF2 + BsmtFinType2_q +
## BsmtFullBath + BsmtHalfBath + BsmtUnfSF + Electrical_q +
## EnclosedPorch + Fence_q + Fireplaces + Foundation_q + FullBath +
## GarageArea + GarageCars + GarageCond_q + GarageYrBlt + GrLivArea +
## HalfBath + HeatingQC_q + KitchenAbvGr + KitchenQual_q + LotArea +
## LotFrontage + LowQualFinSF + MasVnrArea + MiscVal + MoSold +
## MSSubClass + OpenPorchSF + OverallCond + OverallQual + PoolArea +
## ScreenPorch + TotalBsmtSF + TotRmsAbvGrd + WoodDeckSF + X1stFlrSF +
## X2ndFlrSF + X3SsnPorch + YearBuilt + YearRemodAdd + YrSold
#--------------- Linear Model Version 1 --------------
lm_1 <- lm((p2_train_regr_v1),data = p2_train)
summary(lm_1)##
## Call:
## lm(formula = (p2_train_regr_v1), data = p2_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -472691 -16414 -1979 13200 297806
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.261e+05 1.385e+06 0.669 0.503705
## BedroomAbvGr -8.213e+03 1.683e+03 -4.881 1.17e-06 ***
## BsmtFinSF1 1.986e+01 4.669e+00 4.255 2.23e-05 ***
## BsmtFinSF2 1.577e+01 1.048e+01 1.505 0.132474
## BsmtFinType2_q -7.895e+02 1.703e+03 -0.464 0.642987
## BsmtFullBath 8.031e+03 2.567e+03 3.129 0.001792 **
## BsmtHalfBath 1.919e+03 3.998e+03 0.480 0.631350
## BsmtUnfSF 9.120e+00 4.279e+00 2.131 0.033221 *
## Electrical_q -3.220e+03 2.401e+03 -1.341 0.180057
## EnclosedPorch 2.458e+00 1.653e+01 0.149 0.881788
## Fence_q -1.052e+03 7.984e+02 -1.317 0.187955
## Fireplaces 4.547e+03 1.734e+03 2.622 0.008841 **
## Foundation_q -3.347e+03 1.714e+03 -1.953 0.051019 .
## FullBath 3.184e+03 2.762e+03 1.153 0.249170
## GarageArea 3.283e+00 9.608e+00 0.342 0.732631
## GarageCars 1.515e+04 2.935e+03 5.162 2.79e-07 ***
## GarageCond_q 7.608e+02 4.137e+03 0.184 0.854106
## GarageYrBlt -1.528e+01 6.655e+00 -2.296 0.021813 *
## GrLivArea 4.411e+01 4.914e+00 8.976 < 2e-16 ***
## HalfBath -4.686e+02 2.627e+03 -0.178 0.858443
## HeatingQC_q 1.357e+03 1.199e+03 1.132 0.257952
## KitchenAbvGr -1.549e+04 5.176e+03 -2.993 0.002807 **
## KitchenQual_q 1.341e+04 2.085e+03 6.434 1.70e-10 ***
## LotArea 4.128e-01 9.852e-02 4.190 2.96e-05 ***
## LotFrontage 9.013e+00 2.807e+01 0.321 0.748199
## LowQualFinSF -2.675e+01 1.951e+01 -1.371 0.170611
## MasVnrArea 2.925e+01 5.819e+00 5.027 5.63e-07 ***
## MiscVal 2.387e-01 1.817e+00 0.131 0.895508
## MoSold -7.896e+01 3.365e+02 -0.235 0.814482
## MSSubClass -1.573e+02 2.629e+01 -5.985 2.74e-09 ***
## OpenPorchSF -9.371e+00 1.482e+01 -0.632 0.527361
## OverallCond 5.113e+03 1.038e+03 4.927 9.34e-07 ***
## OverallQual 1.520e+04 1.199e+03 12.682 < 2e-16 ***
## PoolArea -2.954e+01 2.355e+01 -1.254 0.209908
## ScreenPorch 5.503e+01 1.678e+01 3.280 0.001063 **
## TotalBsmtSF NA NA NA NA
## TotRmsAbvGrd 4.565e+03 1.211e+03 3.769 0.000171 ***
## WoodDeckSF 2.651e+01 7.824e+00 3.388 0.000723 ***
## X1stFlrSF -1.228e+00 5.306e+00 -0.231 0.817048
## X2ndFlrSF NA NA NA NA
## X3SsnPorch 2.230e+01 3.065e+01 0.728 0.466929
## YearBuilt 2.432e+02 6.659e+01 3.653 0.000269 ***
## YearRemodAdd -2.487e+01 7.007e+01 -0.355 0.722728
## YrSold -6.977e+02 6.862e+02 -1.017 0.309402
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 33860 on 1418 degrees of freedom
## Multiple R-squared: 0.8234, Adjusted R-squared: 0.8183
## F-statistic: 161.3 on 41 and 1418 DF, p-value: < 2.2e-16
###############################################
# Removing "TotalBsmtSF" and "X2ndFlrSF" from "p2_train_vars"
p2_train_vars_2 <- subset(p2_train_vars, select = -c(TotalBsmtSF,X2ndFlrSF))
p2_train_regr_v2 <- as.formula(paste("SalePrice", "~",
paste(sort(colnames(p2_train_vars_2)), collapse = "+"),
sep = ""
))
p2_train_regr_v2## SalePrice ~ BedroomAbvGr + BsmtFinSF1 + BsmtFinSF2 + BsmtFinType2_q +
## BsmtFullBath + BsmtHalfBath + BsmtUnfSF + Electrical_q +
## EnclosedPorch + Fence_q + Fireplaces + Foundation_q + FullBath +
## GarageArea + GarageCars + GarageCond_q + GarageYrBlt + GrLivArea +
## HalfBath + HeatingQC_q + KitchenAbvGr + KitchenQual_q + LotArea +
## LotFrontage + LowQualFinSF + MasVnrArea + MiscVal + MoSold +
## MSSubClass + OpenPorchSF + OverallCond + OverallQual + PoolArea +
## ScreenPorch + TotRmsAbvGrd + WoodDeckSF + X1stFlrSF + X3SsnPorch +
## YearBuilt + YearRemodAdd + YrSold
#--------------- Linear Model Version 2 --------------
lm_2.lm <- lm((p2_train_regr_v2),data = p2_train)
summary(lm_2.lm)##
## Call:
## lm(formula = (p2_train_regr_v2), data = p2_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -472691 -16414 -1979 13200 297806
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.261e+05 1.385e+06 0.669 0.503705
## BedroomAbvGr -8.213e+03 1.683e+03 -4.881 1.17e-06 ***
## BsmtFinSF1 1.986e+01 4.669e+00 4.255 2.23e-05 ***
## BsmtFinSF2 1.577e+01 1.048e+01 1.505 0.132474
## BsmtFinType2_q -7.895e+02 1.703e+03 -0.464 0.642987
## BsmtFullBath 8.031e+03 2.567e+03 3.129 0.001792 **
## BsmtHalfBath 1.919e+03 3.998e+03 0.480 0.631350
## BsmtUnfSF 9.120e+00 4.279e+00 2.131 0.033221 *
## Electrical_q -3.220e+03 2.401e+03 -1.341 0.180057
## EnclosedPorch 2.458e+00 1.653e+01 0.149 0.881788
## Fence_q -1.052e+03 7.984e+02 -1.317 0.187955
## Fireplaces 4.547e+03 1.734e+03 2.622 0.008841 **
## Foundation_q -3.347e+03 1.714e+03 -1.953 0.051019 .
## FullBath 3.184e+03 2.762e+03 1.153 0.249170
## GarageArea 3.283e+00 9.608e+00 0.342 0.732631
## GarageCars 1.515e+04 2.935e+03 5.162 2.79e-07 ***
## GarageCond_q 7.608e+02 4.137e+03 0.184 0.854106
## GarageYrBlt -1.528e+01 6.655e+00 -2.296 0.021813 *
## GrLivArea 4.411e+01 4.914e+00 8.976 < 2e-16 ***
## HalfBath -4.686e+02 2.627e+03 -0.178 0.858443
## HeatingQC_q 1.357e+03 1.199e+03 1.132 0.257952
## KitchenAbvGr -1.549e+04 5.176e+03 -2.993 0.002807 **
## KitchenQual_q 1.341e+04 2.085e+03 6.434 1.70e-10 ***
## LotArea 4.128e-01 9.852e-02 4.190 2.96e-05 ***
## LotFrontage 9.013e+00 2.807e+01 0.321 0.748199
## LowQualFinSF -2.675e+01 1.951e+01 -1.371 0.170611
## MasVnrArea 2.925e+01 5.819e+00 5.027 5.63e-07 ***
## MiscVal 2.387e-01 1.817e+00 0.131 0.895508
## MoSold -7.896e+01 3.365e+02 -0.235 0.814482
## MSSubClass -1.573e+02 2.629e+01 -5.985 2.74e-09 ***
## OpenPorchSF -9.371e+00 1.482e+01 -0.632 0.527361
## OverallCond 5.113e+03 1.038e+03 4.927 9.34e-07 ***
## OverallQual 1.520e+04 1.199e+03 12.682 < 2e-16 ***
## PoolArea -2.954e+01 2.355e+01 -1.254 0.209908
## ScreenPorch 5.503e+01 1.678e+01 3.280 0.001063 **
## TotRmsAbvGrd 4.565e+03 1.211e+03 3.769 0.000171 ***
## WoodDeckSF 2.651e+01 7.824e+00 3.388 0.000723 ***
## X1stFlrSF -1.228e+00 5.306e+00 -0.231 0.817048
## X3SsnPorch 2.230e+01 3.065e+01 0.728 0.466929
## YearBuilt 2.432e+02 6.659e+01 3.653 0.000269 ***
## YearRemodAdd -2.487e+01 7.007e+01 -0.355 0.722728
## YrSold -6.977e+02 6.862e+02 -1.017 0.309402
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 33860 on 1418 degrees of freedom
## Multiple R-squared: 0.8234, Adjusted R-squared: 0.8183
## F-statistic: 161.3 on 41 and 1418 DF, p-value: < 2.2e-16
###############################################Testing for and Freeing From Multicollinearity among Variables
Multicollinearity occurs when two or more predictor variables are highly correlated to each other, such that they do not provide unique or independent information in the regression model.
If the degree of correlation is high enough between variables, it can cause problems when fitting and interpreting the regression model.
To test this model for Multicollinearity we will employ the “imcdiag” function from the “mctest” library and examine the Variance Inflation Factor (VIF) score.
Note : Scores over 5 are moderately multicollinear. Scores over 10 are very problematic
using the VIF measure we see that most of the predictor variables posses low VIF scores indicating that they are not very correlated, but the following variables are moderately to problematic :
GarageCond_q - VIF score is 11.3 —- Problematic —- Will be removed from the model
GarageYrBlt - VIF score is 11.60 —- Problematic —- Will be removed from the model
GrLivArea- VIF score is 8.48 —- moderately multicollinear —-
imcdiag(lm_2.lm)##
## Call:
## imcdiag(mod = lm_2.lm)
##
##
## All Individual Multicollinearity Diagnostics Result
##
## VIF TOL Wi Fi Leamer CVIF Klein IND1
## BedroomAbvGr 2.3969 0.4172 49.5562 50.8626 0.6459 -0.0923 0 0.0118
## BsmtFinSF1 5.7687 0.1733 169.1698 173.6298 0.4164 -0.2220 1 0.0049
## BsmtFinSF2 3.6362 0.2750 93.5204 95.9860 0.5244 -0.1400 0 0.0078
## BsmtFinType2_q 2.9377 0.3404 68.7416 70.5539 0.5834 -0.1131 0 0.0096
## BsmtFullBath 2.2574 0.4430 44.6072 45.7832 0.6656 -0.0869 0 0.0125
## BsmtHalfBath 1.1592 0.8627 5.6461 5.7949 0.9288 -0.0446 0 0.0243
## BsmtUnfSF 4.5482 0.2199 125.8720 129.1905 0.4689 -0.1751 0 0.0062
## Electrical_q 1.2624 0.7922 9.3080 9.5534 0.8900 -0.0486 0 0.0223
## EnclosedPorch 1.2981 0.7704 10.5746 10.8533 0.8777 -0.0500 0 0.0217
## Fence_q 1.1765 0.8500 6.2612 6.4262 0.9219 -0.0453 0 0.0240
## Fireplaces 1.5906 0.6287 20.9503 21.5027 0.7929 -0.0612 0 0.0177
## Foundation_q 1.9496 0.5129 33.6875 34.5756 0.7162 -0.0750 0 0.0145
## FullBath 2.9457 0.3395 69.0246 70.8443 0.5826 -0.1134 0 0.0096
## GarageArea 5.3691 0.1863 154.9947 159.0809 0.4316 -0.2066 0 0.0053
## GarageCars 6.1208 0.1634 181.6593 186.4486 0.4042 -0.2356 1 0.0046
## GarageCond_q 11.2780 0.0887 364.6126 374.2252 0.2978 -0.4341 1 0.0025
## GarageYrBlt 11.5973 0.0862 375.9403 385.8515 0.2936 -0.4464 1 0.0024
## GrLivArea 8.4839 0.1179 265.4925 272.4919 0.3433 -0.3265 1 0.0033
## HalfBath 2.2207 0.4503 43.3030 44.4447 0.6711 -0.0855 0 0.0127
## HeatingQC_q 1.6831 0.5941 24.2324 24.8713 0.7708 -0.0648 0 0.0167
## KitchenAbvGr 1.6549 0.6043 23.2322 23.8446 0.7773 -0.0637 0 0.0170
## KitchenQual_q 2.4362 0.4105 50.9487 52.2919 0.6407 -0.0938 0 0.0116
## LotArea 1.2302 0.8128 8.1681 8.3834 0.9016 -0.0474 0 0.0229
## LotFrontage 1.2047 0.8301 7.2616 7.4530 0.9111 -0.0464 0 0.0234
## LowQualFinSF 1.1450 0.8734 5.1441 5.2797 0.9345 -0.0441 0 0.0246
## MasVnrArea 1.4070 0.7107 14.4380 14.8186 0.8431 -0.0542 0 0.0200
## MiscVal 1.0339 0.9672 1.2034 1.2351 0.9835 -0.0398 0 0.0273
## MoSold 1.0528 0.9498 1.8734 1.9228 0.9746 -0.0405 0 0.0268
## MSSubClass 1.5729 0.6358 20.3231 20.8589 0.7974 -0.0605 0 0.0179
## OpenPorchSF 1.2271 0.8149 8.0573 8.2697 0.9027 -0.0472 0 0.0230
## OverallCond 1.6967 0.5894 24.7163 25.3679 0.7677 -0.0653 0 0.0166
## OverallQual 3.4975 0.2859 88.5980 90.9338 0.5347 -0.1346 0 0.0081
## PoolArea 1.1386 0.8783 4.9172 5.0468 0.9372 -0.0438 0 0.0248
## ScreenPorch 1.1134 0.8981 4.0238 4.1299 0.9477 -0.0429 0 0.0253
## TotRmsAbvGrd 4.9303 0.2028 139.4256 143.1014 0.4504 -0.1898 0 0.0057
## WoodDeckSF 1.2235 0.8173 7.9287 8.1377 0.9041 -0.0471 0 0.0230
## X1stFlrSF 5.3540 0.1868 154.4569 158.5290 0.4322 -0.2061 0 0.0053
## X3SsnPorch 1.0272 0.9735 0.9661 0.9915 0.9867 -0.0395 0 0.0274
## YearBuilt 5.1461 0.1943 147.0833 150.9610 0.4408 -0.1981 0 0.0055
## YearRemodAdd 2.6625 0.3756 58.9784 60.5332 0.6128 -0.1025 0 0.0106
## YrSold 1.0567 0.9464 2.0098 2.0628 0.9728 -0.0407 0 0.0267
## IND2
## BedroomAbvGr 1.2757
## BsmtFinSF1 1.8095
## BsmtFinSF2 1.5870
## BsmtFinType2_q 1.4439
## BsmtFullBath 1.2193
## BsmtHalfBath 0.3006
## BsmtUnfSF 1.7077
## Electrical_q 0.4550
## EnclosedPorch 0.5027
## Fence_q 0.3284
## Fireplaces 0.8128
## Foundation_q 1.0662
## FullBath 1.4459
## GarageArea 1.7813
## GarageCars 1.8314
## GarageCond_q 1.9949
## GarageYrBlt 2.0002
## GrLivArea 1.9310
## HalfBath 1.2032
## HeatingQC_q 0.8884
## KitchenAbvGr 0.8662
## KitchenQual_q 1.2905
## LotArea 0.4097
## LotFrontage 0.3719
## LowQualFinSF 0.2772
## MasVnrArea 0.6332
## MiscVal 0.0718
## MoSold 0.1098
## MSSubClass 0.7973
## OpenPorchSF 0.4052
## OverallCond 0.8989
## OverallQual 1.5631
## PoolArea 0.2665
## ScreenPorch 0.2230
## TotRmsAbvGrd 1.7450
## WoodDeckSF 0.3999
## X1stFlrSF 1.7801
## X3SsnPorch 0.0580
## YearBuilt 1.7636
## YearRemodAdd 1.3668
## YrSold 0.1174
##
## 1 --> COLLINEARITY is detected by the test
## 0 --> COLLINEARITY is not detected by the test
##
## BsmtFinSF2 , BsmtFinType2_q , BsmtHalfBath , Electrical_q , EnclosedPorch , Fence_q , Foundation_q , FullBath , GarageArea , GarageCond_q , HalfBath , HeatingQC_q , LotFrontage , LowQualFinSF , MiscVal , MoSold , OpenPorchSF , PoolArea , X1stFlrSF , X3SsnPorch , YearRemodAdd , YrSold , coefficient(s) are non-significant may be due to multicollinearity
##
## R-square of y on all x: 0.8234
##
## * use method argument to check which regressors may be the reason of collinearity
## ===================================
Regression Modeling V3
# We will remove the "GarageCond_q" and "GarageYrBlt" variables and recompute the Linear Model
p2_train_vars_3 <- subset(p2_train_vars_2, select = -c(GarageCond_q,GarageYrBlt))
p2_train_regr_v3 <- as.formula(paste("SalePrice", "~",
paste(sort(colnames(p2_train_vars_3)), collapse = "+"),
sep = ""
))
p2_train_regr_v3## SalePrice ~ BedroomAbvGr + BsmtFinSF1 + BsmtFinSF2 + BsmtFinType2_q +
## BsmtFullBath + BsmtHalfBath + BsmtUnfSF + Electrical_q +
## EnclosedPorch + Fence_q + Fireplaces + Foundation_q + FullBath +
## GarageArea + GarageCars + GrLivArea + HalfBath + HeatingQC_q +
## KitchenAbvGr + KitchenQual_q + LotArea + LotFrontage + LowQualFinSF +
## MasVnrArea + MiscVal + MoSold + MSSubClass + OpenPorchSF +
## OverallCond + OverallQual + PoolArea + ScreenPorch + TotRmsAbvGrd +
## WoodDeckSF + X1stFlrSF + X3SsnPorch + YearBuilt + YearRemodAdd +
## YrSold
#--------------- Linear Model Version 3 --------------
lm_3.lm <- lm((p2_train_regr_v3),data = p2_train)
summary(lm_3.lm)##
## Call:
## lm(formula = (p2_train_regr_v3), data = p2_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -478319 -16598 -1832 13242 301926
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.019e+06 1.397e+06 0.729 0.465881
## BedroomAbvGr -8.319e+03 1.696e+03 -4.904 1.05e-06 ***
## BsmtFinSF1 1.991e+01 4.706e+00 4.231 2.47e-05 ***
## BsmtFinSF2 1.308e+01 1.056e+01 1.239 0.215636
## BsmtFinType2_q -4.872e+02 1.718e+03 -0.284 0.776729
## BsmtFullBath 8.926e+03 2.585e+03 3.453 0.000571 ***
## BsmtHalfBath 2.044e+03 4.035e+03 0.507 0.612552
## BsmtUnfSF 9.721e+00 4.301e+00 2.260 0.023979 *
## Electrical_q -3.068e+03 2.399e+03 -1.279 0.201193
## EnclosedPorch 1.573e+00 1.668e+01 0.094 0.924866
## Fence_q -1.296e+03 8.045e+02 -1.611 0.107360
## Fireplaces 4.138e+03 1.748e+03 2.368 0.018012 *
## Foundation_q -3.794e+03 1.727e+03 -2.197 0.028199 *
## FullBath 4.213e+03 2.781e+03 1.515 0.130000
## GarageArea -3.242e+00 9.580e+00 -0.338 0.735097
## GarageCars 1.026e+04 2.808e+03 3.654 0.000268 ***
## GrLivArea 4.378e+01 4.957e+00 8.831 < 2e-16 ***
## HalfBath -1.437e+02 2.650e+03 -0.054 0.956770
## HeatingQC_q 1.149e+03 1.208e+03 0.951 0.341950
## KitchenAbvGr -1.289e+04 5.198e+03 -2.479 0.013276 *
## KitchenQual_q 1.348e+04 2.103e+03 6.412 1.95e-10 ***
## LotArea 4.195e-01 9.943e-02 4.219 2.61e-05 ***
## LotFrontage 1.968e+01 2.826e+01 0.696 0.486339
## LowQualFinSF -1.555e+01 1.954e+01 -0.796 0.426233
## MasVnrArea 3.216e+01 5.847e+00 5.500 4.49e-08 ***
## MiscVal 9.816e-02 1.834e+00 0.054 0.957319
## MoSold -8.641e+01 3.396e+02 -0.254 0.799172
## MSSubClass -1.567e+02 2.653e+01 -5.908 4.33e-09 ***
## OpenPorchSF -4.484e+00 1.493e+01 -0.300 0.764012
## OverallCond 4.533e+03 1.036e+03 4.376 1.30e-05 ***
## OverallQual 1.521e+04 1.210e+03 12.573 < 2e-16 ***
## PoolArea -3.086e+01 2.373e+01 -1.301 0.193611
## ScreenPorch 5.328e+01 1.692e+01 3.148 0.001676 **
## TotRmsAbvGrd 4.648e+03 1.222e+03 3.804 0.000149 ***
## WoodDeckSF 2.701e+01 7.892e+00 3.422 0.000639 ***
## X1stFlrSF -7.930e-02 5.337e+00 -0.015 0.988148
## X3SsnPorch 1.972e+01 3.093e+01 0.638 0.523896
## YearBuilt 2.204e+02 6.665e+01 3.307 0.000967 ***
## YearRemodAdd 1.820e+01 6.994e+01 0.260 0.794684
## YrSold -7.728e+02 6.925e+02 -1.116 0.264615
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 34180 on 1420 degrees of freedom
## Multiple R-squared: 0.8198, Adjusted R-squared: 0.8149
## F-statistic: 165.7 on 39 and 1420 DF, p-value: < 2.2e-16
###############################################
imcdiag(lm_3.lm)##
## Call:
## imcdiag(mod = lm_3.lm)
##
##
## All Individual Multicollinearity Diagnostics Result
##
## VIF TOL Wi Fi Leamer CVIF Klein IND1
## BedroomAbvGr 2.3914 0.4182 52.0308 53.4747 0.6467 -0.0968 0 0.0112
## BsmtFinSF1 5.7536 0.1738 177.7583 182.6911 0.4169 -0.2329 1 0.0046
## BsmtFinSF2 3.6237 0.2760 98.1124 100.8350 0.5253 -0.1467 0 0.0074
## BsmtFinType2_q 2.9336 0.3409 72.3071 74.3136 0.5838 -0.1188 0 0.0091
## BsmtFullBath 2.2470 0.4450 46.6298 47.9238 0.6671 -0.0910 0 0.0119
## BsmtHalfBath 1.1591 0.8627 5.9491 6.1142 0.9288 -0.0469 0 0.0231
## BsmtUnfSF 4.5116 0.2217 131.3137 134.9577 0.4708 -0.1826 0 0.0059
## Electrical_q 1.2370 0.8084 8.8621 9.1080 0.8991 -0.0501 0 0.0216
## EnclosedPorch 1.2977 0.7706 11.1336 11.4426 0.8778 -0.0525 0 0.0206
## Fence_q 1.1725 0.8529 6.4519 6.6309 0.9235 -0.0475 0 0.0228
## Fireplaces 1.5850 0.6309 21.8767 22.4838 0.7943 -0.0642 0 0.0169
## Foundation_q 1.9443 0.5143 35.3132 36.2932 0.7172 -0.0787 0 0.0138
## FullBath 2.9314 0.3411 72.2248 74.2290 0.5841 -0.1187 0 0.0091
## GarageArea 5.2399 0.1908 158.5491 162.9488 0.4369 -0.2121 0 0.0051
## GarageCars 5.5006 0.1818 168.2999 172.9702 0.4264 -0.2227 0 0.0049
## GrLivArea 8.4742 0.1180 279.4952 287.2511 0.3435 -0.3430 1 0.0032
## HalfBath 2.2176 0.4509 45.5307 46.7941 0.6715 -0.0898 0 0.0121
## HeatingQC_q 1.6788 0.5957 25.3823 26.0867 0.7718 -0.0680 0 0.0159
## KitchenAbvGr 1.6384 0.6104 23.8729 24.5353 0.7812 -0.0663 0 0.0163
## KitchenQual_q 2.4328 0.4110 53.5805 55.0673 0.6411 -0.0985 0 0.0110
## LotArea 1.2300 0.8130 8.6024 8.8411 0.9017 -0.0498 0 0.0217
## LotFrontage 1.1985 0.8344 7.4214 7.6274 0.9135 -0.0485 0 0.0223
## LowQualFinSF 1.1276 0.8868 4.7711 4.9035 0.9417 -0.0456 0 0.0237
## MasVnrArea 1.3945 0.7171 14.7538 15.1633 0.8468 -0.0565 0 0.0192
## MiscVal 1.0337 0.9674 1.2593 1.2943 0.9836 -0.0418 0 0.0259
## MoSold 1.0527 0.9500 1.9695 2.0242 0.9747 -0.0426 0 0.0254
## MSSubClass 1.5726 0.6359 21.4106 22.0048 0.7974 -0.0637 0 0.0170
## OpenPorchSF 1.2224 0.8180 8.3182 8.5490 0.9045 -0.0495 0 0.0219
## OverallCond 1.6596 0.6026 24.6647 25.3491 0.7762 -0.0672 0 0.0161
## OverallQual 3.4964 0.2860 93.3534 95.9439 0.5348 -0.1415 0 0.0076
## PoolArea 1.1352 0.8809 5.0546 5.1949 0.9386 -0.0460 0 0.0236
## ScreenPorch 1.1118 0.8994 4.1807 4.2967 0.9484 -0.0450 0 0.0241
## TotRmsAbvGrd 4.9256 0.2030 146.7952 150.8688 0.4506 -0.1994 0 0.0054
## WoodDeckSF 1.2221 0.8183 8.3050 8.5354 0.9046 -0.0495 0 0.0219
## X1stFlrSF 5.3168 0.1881 161.4269 165.9064 0.4337 -0.2152 0 0.0050
## X3SsnPorch 1.0268 0.9739 1.0040 1.0319 0.9868 -0.0416 0 0.0260
## YearBuilt 5.0606 0.1976 151.8454 156.0591 0.4445 -0.2049 0 0.0053
## YearRemodAdd 2.6038 0.3841 59.9727 61.6369 0.6197 -0.1054 0 0.0103
## YrSold 1.0562 0.9468 2.1017 2.1600 0.9730 -0.0428 0 0.0253
## IND2
## BedroomAbvGr 1.3522
## BsmtFinSF1 1.9201
## BsmtFinSF2 1.6826
## BsmtFinType2_q 1.5318
## BsmtFullBath 1.2897
## BsmtHalfBath 0.3190
## BsmtUnfSF 1.8089
## Electrical_q 0.4452
## EnclosedPorch 0.5332
## Fence_q 0.3420
## Fireplaces 0.8578
## Foundation_q 1.1287
## FullBath 1.5312
## GarageArea 1.8805
## GarageCars 1.9015
## GrLivArea 2.0497
## HalfBath 1.2760
## HeatingQC_q 0.9396
## KitchenAbvGr 0.9055
## KitchenQual_q 1.3687
## LotArea 0.4346
## LotFrontage 0.3848
## LowQualFinSF 0.2630
## MasVnrArea 0.6575
## MiscVal 0.0757
## MoSold 0.1163
## MSSubClass 0.8461
## OpenPorchSF 0.4229
## OverallCond 0.9236
## OverallQual 1.6593
## PoolArea 0.2767
## ScreenPorch 0.2337
## TotRmsAbvGrd 1.8522
## WoodDeckSF 0.4223
## X1stFlrSF 1.8869
## X3SsnPorch 0.0608
## YearBuilt 1.8647
## YearRemodAdd 1.4314
## YrSold 0.1237
##
## 1 --> COLLINEARITY is detected by the test
## 0 --> COLLINEARITY is not detected by the test
##
## BsmtFinSF2 , BsmtFinType2_q , BsmtHalfBath , Electrical_q , EnclosedPorch , Fence_q , FullBath , GarageArea , HalfBath , HeatingQC_q , LotFrontage , LowQualFinSF , MiscVal , MoSold , OpenPorchSF , PoolArea , X1stFlrSF , X3SsnPorch , YearRemodAdd , YrSold , coefficient(s) are non-significant may be due to multicollinearity
##
## R-square of y on all x: 0.8198
##
## * use method argument to check which regressors may be the reason of collinearity
## ===================================
Version 3 Discussion :
Note that after computing Linear Model V3, we have Multiple - \(R^{2} = 0.8198\) and \(R^{2} = 0.8149\) - Not a significant difference from V2 modeling
Also - The Multicollinearity test shows the VIF scores for the following variables to be > 8
“GrLivArea”
These will be removed in V4 :
Regression Modeling V4
# We will remove the "GrLivArea" variables :
p2_train_vars_4 <- subset(p2_train_vars_3, select = -c(GrLivArea))
p2_train_regr_v4 <- as.formula(paste("SalePrice", "~",
paste(sort(colnames(p2_train_vars_4)), collapse = "+"),
sep = ""
))
p2_train_regr_v4## SalePrice ~ BedroomAbvGr + BsmtFinSF1 + BsmtFinSF2 + BsmtFinType2_q +
## BsmtFullBath + BsmtHalfBath + BsmtUnfSF + Electrical_q +
## EnclosedPorch + Fence_q + Fireplaces + Foundation_q + FullBath +
## GarageArea + GarageCars + HalfBath + HeatingQC_q + KitchenAbvGr +
## KitchenQual_q + LotArea + LotFrontage + LowQualFinSF + MasVnrArea +
## MiscVal + MoSold + MSSubClass + OpenPorchSF + OverallCond +
## OverallQual + PoolArea + ScreenPorch + TotRmsAbvGrd + WoodDeckSF +
## X1stFlrSF + X3SsnPorch + YearBuilt + YearRemodAdd + YrSold
#--------------- Linear Model Version 4 --------------
lm_4.lm <- lm((p2_train_regr_v4),data = p2_train)
summary(lm_4.lm)##
## Call:
## lm(formula = (p2_train_regr_v4), data = p2_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -446391 -17796 -2727 14175 341600
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.245e+06 1.435e+06 0.868 0.385580
## BedroomAbvGr -6.221e+03 1.724e+03 -3.607 0.000320 ***
## BsmtFinSF1 2.145e+01 4.829e+00 4.443 9.55e-06 ***
## BsmtFinSF2 1.211e+01 1.084e+01 1.117 0.264319
## BsmtFinType2_q -1.880e+02 1.763e+03 -0.107 0.915102
## BsmtFullBath 7.746e+03 2.650e+03 2.923 0.003526 **
## BsmtHalfBath 9.753e+02 4.141e+03 0.236 0.813843
## BsmtUnfSF 8.507e+00 4.414e+00 1.927 0.054149 .
## Electrical_q -3.719e+03 2.462e+03 -1.511 0.131080
## EnclosedPorch 9.446e+00 1.710e+01 0.552 0.580759
## Fence_q -1.432e+03 8.258e+02 -1.734 0.083097 .
## Fireplaces 5.829e+03 1.783e+03 3.268 0.001109 **
## Foundation_q -5.692e+03 1.760e+03 -3.235 0.001245 **
## FullBath 1.332e+04 2.652e+03 5.022 5.76e-07 ***
## GarageArea 6.983e+00 9.765e+00 0.715 0.474665
## GarageCars 8.707e+03 2.878e+03 3.025 0.002527 **
## HalfBath 1.119e+04 2.380e+03 4.703 2.81e-06 ***
## HeatingQC_q 2.142e+03 1.235e+03 1.734 0.083201 .
## KitchenAbvGr -1.935e+04 5.284e+03 -3.662 0.000260 ***
## KitchenQual_q 1.412e+04 2.158e+03 6.543 8.37e-11 ***
## LotArea 4.944e-01 1.017e-01 4.860 1.30e-06 ***
## LotFrontage 2.581e+01 2.901e+01 0.890 0.373782
## LowQualFinSF 1.069e+01 1.983e+01 0.539 0.590117
## MasVnrArea 3.772e+01 5.968e+00 6.320 3.49e-10 ***
## MiscVal 6.330e-02 1.883e+00 0.034 0.973184
## MoSold 1.115e+01 3.485e+02 0.032 0.974481
## MSSubClass -1.064e+02 2.660e+01 -4.001 6.62e-05 ***
## OpenPorchSF 6.207e+00 1.528e+01 0.406 0.684645
## OverallCond 4.143e+03 1.063e+03 3.899 0.000101 ***
## OverallQual 1.695e+04 1.226e+03 13.830 < 2e-16 ***
## PoolArea -1.204e+00 2.412e+01 -0.050 0.960196
## ScreenPorch 5.400e+01 1.737e+01 3.108 0.001921 **
## TotRmsAbvGrd 9.574e+03 1.116e+03 8.578 < 2e-16 ***
## WoodDeckSF 3.161e+01 8.086e+00 3.910 9.68e-05 ***
## X1stFlrSF 1.735e+01 5.092e+00 3.407 0.000675 ***
## X3SsnPorch 2.129e+01 3.176e+01 0.670 0.502755
## YearBuilt 2.657e+01 6.461e+01 0.411 0.680973
## YearRemodAdd 2.468e+01 7.181e+01 0.344 0.731161
## YrSold -7.093e+02 7.109e+02 -0.998 0.318553
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 35090 on 1421 degrees of freedom
## Multiple R-squared: 0.8099, Adjusted R-squared: 0.8049
## F-statistic: 159.4 on 38 and 1421 DF, p-value: < 2.2e-16
imcdiag(lm_4.lm)##
## Call:
## imcdiag(mod = lm_4.lm)
##
##
## All Individual Multicollinearity Diagnostics Result
##
## VIF TOL Wi Fi Leamer CVIF Klein IND1
## BedroomAbvGr 2.3445 0.4265 51.6719 53.1446 0.6531 -0.1128 0 0.0111
## BsmtFinSF1 5.7457 0.1740 182.3872 187.5853 0.4172 -0.2766 1 0.0045
## BsmtFinSF2 3.6233 0.2760 100.8198 103.6932 0.5253 -0.1744 0 0.0072
## BsmtFinType2_q 2.9325 0.3410 74.2697 76.3864 0.5840 -0.1412 0 0.0089
## BsmtFullBath 2.2410 0.4462 47.6933 49.0526 0.6680 -0.1079 0 0.0116
## BsmtHalfBath 1.1580 0.8635 6.0741 6.2472 0.9293 -0.0557 0 0.0225
## BsmtUnfSF 4.5070 0.2219 134.7807 138.6220 0.4710 -0.2169 0 0.0058
## Electrical_q 1.2358 0.8092 9.0630 9.3213 0.8995 -0.0595 0 0.0211
## EnclosedPorch 1.2940 0.7728 11.3001 11.6222 0.8791 -0.0623 0 0.0201
## Fence_q 1.1721 0.8532 6.6144 6.8029 0.9237 -0.0564 0 0.0222
## Fireplaces 1.5660 0.6386 21.7532 22.3732 0.7991 -0.0754 0 0.0166
## Foundation_q 1.9142 0.5224 35.1365 36.1379 0.7228 -0.0921 0 0.0136
## FullBath 2.5286 0.3955 58.7463 60.4206 0.6289 -0.1217 0 0.0103
## GarageArea 5.1634 0.1937 160.0077 164.5680 0.4401 -0.2485 0 0.0050
## GarageCars 5.4790 0.1825 172.1398 177.0459 0.4272 -0.2637 1 0.0047
## HalfBath 1.6971 0.5892 26.7905 27.5540 0.7676 -0.0817 0 0.0153
## HeatingQC_q 1.6642 0.6009 25.5282 26.2558 0.7752 -0.0801 0 0.0156
## KitchenAbvGr 1.6059 0.6227 23.2880 23.9517 0.7891 -0.0773 0 0.0162
## KitchenQual_q 2.4300 0.4115 54.9577 56.5240 0.6415 -0.1170 0 0.0107
## LotArea 1.2211 0.8189 8.4978 8.7400 0.9049 -0.0588 0 0.0213
## LotFrontage 1.1977 0.8349 7.5996 7.8162 0.9137 -0.0577 0 0.0217
## LowQualFinSF 1.1015 0.9078 3.9018 4.0130 0.9528 -0.0530 0 0.0236
## MasVnrArea 1.3784 0.7255 14.5417 14.9561 0.8518 -0.0663 0 0.0189
## MiscVal 1.0337 0.9674 1.2941 1.3310 0.9836 -0.0498 0 0.0252
## MoSold 1.0516 0.9510 1.9813 2.0378 0.9752 -0.0506 0 0.0247
## MSSubClass 1.5001 0.6666 19.2206 19.7684 0.8165 -0.0722 0 0.0173
## OpenPorchSF 1.2144 0.8234 8.2402 8.4751 0.9074 -0.0585 0 0.0214
## OverallCond 1.6566 0.6037 25.2333 25.9525 0.7770 -0.0797 0 0.0157
## OverallQual 3.4038 0.2938 92.3855 95.0185 0.5420 -0.1638 0 0.0076
## PoolArea 1.1124 0.8989 4.3211 4.4442 0.9481 -0.0535 0 0.0234
## ScreenPorch 1.1118 0.8995 4.2957 4.4181 0.9484 -0.0535 0 0.0234
## TotRmsAbvGrd 3.8987 0.2565 111.4045 114.5796 0.5065 -0.1877 0 0.0067
## WoodDeckSF 1.2168 0.8219 8.3307 8.5681 0.9066 -0.0586 0 0.0214
## X1stFlrSF 4.5901 0.2179 137.9758 141.9082 0.4668 -0.2209 0 0.0057
## X3SsnPorch 1.0268 0.9739 1.0306 1.0600 0.9869 -0.0494 0 0.0253
## YearBuilt 4.5118 0.2216 134.9689 138.8155 0.4708 -0.2172 0 0.0058
## YearRemodAdd 2.6035 0.3841 61.6259 63.3823 0.6198 -0.1253 0 0.0100
## YrSold 1.0561 0.9469 2.1556 2.2171 0.9731 -0.0508 0 0.0246
## IND2
## BedroomAbvGr 1.4110
## BsmtFinSF1 2.0322
## BsmtFinSF2 1.7814
## BsmtFinType2_q 1.6214
## BsmtFullBath 1.3625
## BsmtHalfBath 0.3358
## BsmtUnfSF 1.9145
## Electrical_q 0.4695
## EnclosedPorch 0.5591
## Fence_q 0.3613
## Fireplaces 0.8893
## Foundation_q 1.1751
## FullBath 1.4874
## GarageArea 1.9839
## GarageCars 2.0114
## HalfBath 1.0106
## HeatingQC_q 0.9820
## KitchenAbvGr 0.9284
## KitchenQual_q 1.4479
## LotArea 0.4455
## LotFrontage 0.4062
## LowQualFinSF 0.2268
## MasVnrArea 0.6754
## MiscVal 0.0801
## MoSold 0.1206
## MSSubClass 0.8203
## OpenPorchSF 0.4344
## OverallCond 0.9752
## OverallQual 1.7376
## PoolArea 0.2487
## ScreenPorch 0.2474
## TotRmsAbvGrd 1.8293
## WoodDeckSF 0.4383
## X1stFlrSF 1.9244
## X3SsnPorch 0.0643
## YearBuilt 1.9151
## YearRemodAdd 1.5154
## YrSold 0.1307
##
## 1 --> COLLINEARITY is detected by the test
## 0 --> COLLINEARITY is not detected by the test
##
## BsmtFinSF2 , BsmtFinType2_q , BsmtHalfBath , BsmtUnfSF , Electrical_q , EnclosedPorch , Fence_q , GarageArea , HeatingQC_q , LotFrontage , LowQualFinSF , MiscVal , MoSold , OpenPorchSF , PoolArea , X3SsnPorch , YearBuilt , YearRemodAdd , YrSold , coefficient(s) are non-significant may be due to multicollinearity
##
## R-square of y on all x: 0.8099
##
## * use method argument to check which regressors may be the reason of collinearity
## ===================================
Version 4 Discussion :
Note that after computing Linear Model V4, we have Multiple - \(R^{2} = 0.8099\) and \(R^{2} = 0.8049\) -
Note - As a final tuning to the model, we will remove the variables with VIF Scores > 3 in V5 of the Model. - Taking a more Conservative Approach as suggested by some researchers - https://quantifyinghealth.com/vif-threshold/
Regression Modeling V5
# We will remove the "BsmtFinSF1", "BsmtFinSF2, "BsmtUnfSF", "GarageArea", "GarageCars", "OverallQual", "TotRmsAbvGrd", "X1stFlrSF", "YearBuilt" variables :
p2_train_vars_5 <- subset(p2_train_vars_4, select = -c(BsmtFinSF1,BsmtFinSF2,BsmtUnfSF,GarageArea,GarageCars,OverallQual,TotRmsAbvGrd,X1stFlrSF,YearBuilt))
p2_train_regr_v5 <- as.formula(paste("SalePrice", "~",
paste(sort(colnames(p2_train_vars_5)), collapse = "+"),
sep = ""
))
p2_train_regr_v5## SalePrice ~ BedroomAbvGr + BsmtFinType2_q + BsmtFullBath + BsmtHalfBath +
## Electrical_q + EnclosedPorch + Fence_q + Fireplaces + Foundation_q +
## FullBath + HalfBath + HeatingQC_q + KitchenAbvGr + KitchenQual_q +
## LotArea + LotFrontage + LowQualFinSF + MasVnrArea + MiscVal +
## MoSold + MSSubClass + OpenPorchSF + OverallCond + PoolArea +
## ScreenPorch + WoodDeckSF + X3SsnPorch + YearRemodAdd + YrSold
#--------------- Linear Model Version 5 --------------
lm_5.lm <- lm((p2_train_regr_v5),data = p2_train)
summary(lm_5.lm)##
## Call:
## lm(formula = (p2_train_regr_v5), data = p2_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -348505 -24685 -3607 19110 381768
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.594e+06 1.729e+06 1.501 0.133664
## BedroomAbvGr 1.727e+03 1.671e+03 1.034 0.301320
## BsmtFinType2_q 2.316e+01 1.315e+03 0.018 0.985952
## BsmtFullBath 1.896e+04 2.404e+03 7.888 6.06e-15 ***
## BsmtHalfBath 2.940e+03 4.859e+03 0.605 0.545166
## Electrical_q -9.376e+02 2.922e+03 -0.321 0.748349
## EnclosedPorch 1.846e+01 1.930e+01 0.957 0.338954
## Fence_q -2.196e+03 9.962e+02 -2.205 0.027631 *
## Fireplaces 2.023e+04 1.998e+03 10.127 < 2e-16 ***
## Foundation_q -8.058e+03 1.933e+03 -4.169 3.25e-05 ***
## FullBath 3.456e+04 2.956e+03 11.692 < 2e-16 ***
## HalfBath 1.574e+04 2.523e+03 6.237 5.85e-10 ***
## HeatingQC_q 3.496e+03 1.480e+03 2.362 0.018305 *
## KitchenAbvGr -9.404e+03 5.833e+03 -1.612 0.107172
## KitchenQual_q 3.477e+04 2.417e+03 14.383 < 2e-16 ***
## LotArea 6.541e-01 1.216e-01 5.381 8.63e-08 ***
## LotFrontage 1.617e+02 3.446e+01 4.692 2.97e-06 ***
## LowQualFinSF 3.620e+01 2.343e+01 1.545 0.122576
## MasVnrArea 8.466e+01 6.802e+00 12.446 < 2e-16 ***
## MiscVal 1.326e+00 2.276e+00 0.583 0.560242
## MoSold 6.738e+01 4.213e+02 0.160 0.872944
## MSSubClass -1.824e+02 3.008e+01 -6.065 1.68e-09 ***
## OpenPorchSF 5.187e+01 1.817e+01 2.855 0.004372 **
## OverallCond 2.469e+03 1.142e+03 2.162 0.030755 *
## PoolArea 2.082e+01 2.894e+01 0.720 0.471877
## ScreenPorch 7.410e+01 2.090e+01 3.546 0.000403 ***
## WoodDeckSF 5.198e+01 9.735e+00 5.340 1.08e-07 ***
## X3SsnPorch 3.561e+01 3.842e+01 0.927 0.354057
## YearRemodAdd 1.829e+02 8.294e+01 2.205 0.027597 *
## YrSold -1.486e+03 8.603e+02 -1.727 0.084323 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 42590 on 1430 degrees of freedom
## Multiple R-squared: 0.7183, Adjusted R-squared: 0.7126
## F-statistic: 125.8 on 29 and 1430 DF, p-value: < 2.2e-16
imcdiag(lm_5.lm)##
## Call:
## imcdiag(mod = lm_5.lm)
##
##
## All Individual Multicollinearity Diagnostics Result
##
## VIF TOL Wi Fi Leamer CVIF Klein IND1 IND2
## BedroomAbvGr 1.4940 0.6693 25.2490 26.2025 0.8181 -0.2998 0 0.0131 1.6610
## BsmtFinType2_q 1.1075 0.9029 5.4932 5.7006 0.9502 -0.2223 0 0.0177 0.4875
## BsmtFullBath 1.2515 0.7990 12.8531 13.3385 0.8939 -0.2512 0 0.0156 1.0094
## BsmtHalfBath 1.0826 0.9237 4.2205 4.3799 0.9611 -0.2173 0 0.0181 0.3832
## Electrical_q 1.1823 0.8458 9.3193 9.6712 0.9197 -0.2373 0 0.0165 0.7747
## EnclosedPorch 1.1191 0.8936 6.0846 6.3144 0.9453 -0.2246 0 0.0175 0.5344
## Fence_q 1.1582 0.8634 8.0854 8.3907 0.9292 -0.2324 0 0.0169 0.6861
## Fireplaces 1.3347 0.7492 17.1067 17.7527 0.8656 -0.2679 0 0.0147 1.2597
## Foundation_q 1.5690 0.6374 29.0775 30.1755 0.7984 -0.3149 0 0.0125 1.8215
## FullBath 2.1337 0.4687 57.9386 60.1265 0.6846 -0.4282 0 0.0092 2.6688
## HalfBath 1.2956 0.7719 15.1057 15.6761 0.8786 -0.2600 0 0.0151 1.1459
## HeatingQC_q 1.6222 0.6164 31.8004 33.0013 0.7851 -0.3256 0 0.0121 1.9266
## KitchenAbvGr 1.3290 0.7524 16.8147 17.4497 0.8674 -0.2667 0 0.0147 1.2435
## KitchenQual_q 2.0714 0.4828 54.7556 56.8233 0.6948 -0.4157 0 0.0094 2.5980
## LotArea 1.1842 0.8444 9.4145 9.7700 0.9189 -0.2377 0 0.0165 0.7813
## LotFrontage 1.1481 0.8710 7.5678 7.8535 0.9333 -0.2304 0 0.0170 0.6479
## LowQualFinSF 1.0441 0.9578 2.2526 2.3377 0.9787 -0.2095 0 0.0187 0.2120
## MasVnrArea 1.2160 0.8224 11.0368 11.4536 0.9069 -0.2440 0 0.0161 0.8921
## MiscVal 1.0260 0.9747 1.3288 1.3790 0.9872 -0.2059 0 0.0191 0.1273
## MoSold 1.0437 0.9582 2.2317 2.3160 0.9789 -0.2095 0 0.0187 0.2102
## MSSubClass 1.3024 0.7678 15.4542 16.0378 0.8763 -0.2614 0 0.0150 1.1662
## OpenPorchSF 1.1661 0.8575 8.4901 8.8107 0.9260 -0.2340 0 0.0168 0.7156
## OverallCond 1.2988 0.7700 15.2701 15.8467 0.8775 -0.2607 0 0.0151 1.1555
## PoolArea 1.0874 0.9196 4.4681 4.6368 0.9590 -0.2182 0 0.0180 0.4038
## ScreenPorch 1.0920 0.9157 4.7039 4.8816 0.9569 -0.2192 0 0.0179 0.4234
## WoodDeckSF 1.1978 0.8348 10.1113 10.4932 0.9137 -0.2404 0 0.0163 0.8296
## X3SsnPorch 1.0204 0.9800 1.0429 1.0823 0.9900 -0.2048 0 0.0192 0.1004
## YearRemodAdd 2.3590 0.4239 69.4554 72.0782 0.6511 -0.4734 0 0.0083 2.8937
## YrSold 1.0503 0.9521 2.5723 2.6695 0.9757 -0.2108 0 0.0186 0.2407
##
## 1 --> COLLINEARITY is detected by the test
## 0 --> COLLINEARITY is not detected by the test
##
## BedroomAbvGr , BsmtFinType2_q , BsmtHalfBath , Electrical_q , EnclosedPorch , KitchenAbvGr , LowQualFinSF , MiscVal , MoSold , PoolArea , X3SsnPorch , YrSold , coefficient(s) are non-significant may be due to multicollinearity
##
## R-square of y on all x: 0.7183
##
## * use method argument to check which regressors may be the reason of collinearity
## ===================================
Version 5 Discussion :
Note that after computing Linear Model V5, we have Multiple - \(R^{2} = 0.7183\) and \(R^{2} = 0.7126\) - A Decline - Not in the Expected Direction
Final Model Adjustments
We will test our model with a low score stepAIC model
stepAIC is one of the most commonly used search method for feature selection. We try to keep on minimizing the stepAIC value to come up with the final set of features. “stepAIC” does not necessarily mean to improve the model performance, however, it is used to simplify the model without impacting much on the performance.
We will use the stepAIC procedure to determine the final model components !
This is based on a model prediction with the lowest AIC score :
the scores for our model range from AIC=31154.79 to AIC=31141.34
We will be using the model with score : AIC=31141.34
#### - StepAIc
stepAIC(lm_5.lm, direction="both")## Start: AIC=31154.79
## SalePrice ~ BedroomAbvGr + BsmtFinType2_q + BsmtFullBath + BsmtHalfBath +
## Electrical_q + EnclosedPorch + Fence_q + Fireplaces + Foundation_q +
## FullBath + HalfBath + HeatingQC_q + KitchenAbvGr + KitchenQual_q +
## LotArea + LotFrontage + LowQualFinSF + MasVnrArea + MiscVal +
## MoSold + MSSubClass + OpenPorchSF + OverallCond + PoolArea +
## ScreenPorch + WoodDeckSF + X3SsnPorch + YearRemodAdd + YrSold
##
## Df Sum of Sq RSS AIC
## - BsmtFinType2_q 1 5.6243e+05 2.5934e+12 31153
## - MoSold 1 4.6398e+07 2.5935e+12 31153
## - Electrical_q 1 1.8673e+08 2.5936e+12 31153
## - MiscVal 1 6.1561e+08 2.5940e+12 31153
## - BsmtHalfBath 1 6.6417e+08 2.5941e+12 31153
## - PoolArea 1 9.3916e+08 2.5944e+12 31153
## - X3SsnPorch 1 1.5586e+09 2.5950e+12 31154
## - EnclosedPorch 1 1.6594e+09 2.5951e+12 31154
## - BedroomAbvGr 1 1.9389e+09 2.5954e+12 31154
## <none> 2.5934e+12 31155
## - LowQualFinSF 1 4.3289e+09 2.5977e+12 31155
## - KitchenAbvGr 1 4.7129e+09 2.5981e+12 31155
## - YrSold 1 5.4112e+09 2.5988e+12 31156
## - OverallCond 1 8.4800e+09 2.6019e+12 31158
## - Fence_q 1 8.8155e+09 2.6022e+12 31158
## - YearRemodAdd 1 8.8194e+09 2.6022e+12 31158
## - HeatingQC_q 1 1.0119e+10 2.6035e+12 31159
## - OpenPorchSF 1 1.4778e+10 2.6082e+12 31161
## - ScreenPorch 1 2.2807e+10 2.6162e+12 31166
## - Foundation_q 1 3.1514e+10 2.6249e+12 31170
## - LotFrontage 1 3.9923e+10 2.6333e+12 31175
## - WoodDeckSF 1 5.1709e+10 2.6451e+12 31182
## - LotArea 1 5.2519e+10 2.6459e+12 31182
## - MSSubClass 1 6.6715e+10 2.6601e+12 31190
## - HalfBath 1 7.0554e+10 2.6640e+12 31192
## - BsmtFullBath 1 1.1284e+11 2.7062e+12 31215
## - Fireplaces 1 1.8599e+11 2.7794e+12 31254
## - FullBath 1 2.4791e+11 2.8413e+12 31286
## - MasVnrArea 1 2.8094e+11 2.8743e+12 31303
## - KitchenQual_q 1 3.7518e+11 2.9686e+12 31350
##
## Step: AIC=31152.79
## SalePrice ~ BedroomAbvGr + BsmtFullBath + BsmtHalfBath + Electrical_q +
## EnclosedPorch + Fence_q + Fireplaces + Foundation_q + FullBath +
## HalfBath + HeatingQC_q + KitchenAbvGr + KitchenQual_q + LotArea +
## LotFrontage + LowQualFinSF + MasVnrArea + MiscVal + MoSold +
## MSSubClass + OpenPorchSF + OverallCond + PoolArea + ScreenPorch +
## WoodDeckSF + X3SsnPorch + YearRemodAdd + YrSold
##
## Df Sum of Sq RSS AIC
## - MoSold 1 4.6241e+07 2.5935e+12 31151
## - Electrical_q 1 1.8620e+08 2.5936e+12 31151
## - MiscVal 1 6.1617e+08 2.5940e+12 31151
## - BsmtHalfBath 1 6.7729e+08 2.5941e+12 31151
## - PoolArea 1 9.3863e+08 2.5944e+12 31151
## - X3SsnPorch 1 1.5584e+09 2.5950e+12 31152
## - EnclosedPorch 1 1.6676e+09 2.5951e+12 31152
## - BedroomAbvGr 1 1.9388e+09 2.5954e+12 31152
## <none> 2.5934e+12 31153
## - LowQualFinSF 1 4.3299e+09 2.5977e+12 31153
## - KitchenAbvGr 1 4.7388e+09 2.5982e+12 31154
## - YrSold 1 5.4112e+09 2.5988e+12 31154
## + BsmtFinType2_q 1 5.6243e+05 2.5934e+12 31155
## - OverallCond 1 8.4924e+09 2.6019e+12 31156
## - YearRemodAdd 1 8.8195e+09 2.6022e+12 31156
## - Fence_q 1 8.8855e+09 2.6023e+12 31156
## - HeatingQC_q 1 1.0126e+10 2.6035e+12 31157
## - OpenPorchSF 1 1.4781e+10 2.6082e+12 31159
## - ScreenPorch 1 2.2847e+10 2.6163e+12 31164
## - Foundation_q 1 3.1694e+10 2.6251e+12 31169
## - LotFrontage 1 3.9923e+10 2.6333e+12 31173
## - WoodDeckSF 1 5.1831e+10 2.6452e+12 31180
## - LotArea 1 5.2676e+10 2.6461e+12 31180
## - MSSubClass 1 6.6735e+10 2.6601e+12 31188
## - HalfBath 1 7.0555e+10 2.6640e+12 31190
## - BsmtFullBath 1 1.1601e+11 2.7094e+12 31215
## - Fireplaces 1 1.8601e+11 2.7794e+12 31252
## - FullBath 1 2.4791e+11 2.8413e+12 31284
## - MasVnrArea 1 2.8143e+11 2.8748e+12 31301
## - KitchenQual_q 1 3.7526e+11 2.9687e+12 31348
##
## Step: AIC=31150.82
## SalePrice ~ BedroomAbvGr + BsmtFullBath + BsmtHalfBath + Electrical_q +
## EnclosedPorch + Fence_q + Fireplaces + Foundation_q + FullBath +
## HalfBath + HeatingQC_q + KitchenAbvGr + KitchenQual_q + LotArea +
## LotFrontage + LowQualFinSF + MasVnrArea + MiscVal + MSSubClass +
## OpenPorchSF + OverallCond + PoolArea + ScreenPorch + WoodDeckSF +
## X3SsnPorch + YearRemodAdd + YrSold
##
## Df Sum of Sq RSS AIC
## - Electrical_q 1 1.8698e+08 2.5936e+12 31149
## - MiscVal 1 6.1528e+08 2.5941e+12 31149
## - BsmtHalfBath 1 6.8627e+08 2.5941e+12 31149
## - PoolArea 1 9.1889e+08 2.5944e+12 31149
## - X3SsnPorch 1 1.5787e+09 2.5950e+12 31150
## - EnclosedPorch 1 1.6544e+09 2.5951e+12 31150
## - BedroomAbvGr 1 1.9580e+09 2.5954e+12 31150
## <none> 2.5935e+12 31151
## - LowQualFinSF 1 4.3066e+09 2.5978e+12 31151
## - KitchenAbvGr 1 4.7098e+09 2.5982e+12 31152
## - YrSold 1 5.6781e+09 2.5991e+12 31152
## + MoSold 1 4.6241e+07 2.5934e+12 31153
## + BsmtFinType2_q 1 4.0577e+05 2.5935e+12 31153
## - OverallCond 1 8.4819e+09 2.6019e+12 31154
## - YearRemodAdd 1 8.8201e+09 2.6023e+12 31154
## - Fence_q 1 8.8690e+09 2.6023e+12 31154
## - HeatingQC_q 1 1.0112e+10 2.6036e+12 31155
## - OpenPorchSF 1 1.4917e+10 2.6084e+12 31157
## - ScreenPorch 1 2.2894e+10 2.6164e+12 31162
## - Foundation_q 1 3.1651e+10 2.6251e+12 31167
## - LotFrontage 1 3.9948e+10 2.6334e+12 31171
## - WoodDeckSF 1 5.1925e+10 2.6454e+12 31178
## - LotArea 1 5.2632e+10 2.6461e+12 31178
## - MSSubClass 1 6.6792e+10 2.6602e+12 31186
## - HalfBath 1 7.0513e+10 2.6640e+12 31188
## - BsmtFullBath 1 1.1596e+11 2.7094e+12 31213
## - Fireplaces 1 1.8644e+11 2.7799e+12 31250
## - FullBath 1 2.4800e+11 2.8415e+12 31282
## - MasVnrArea 1 2.8155e+11 2.8750e+12 31299
## - KitchenQual_q 1 3.7648e+11 2.9699e+12 31347
##
## Step: AIC=31148.92
## SalePrice ~ BedroomAbvGr + BsmtFullBath + BsmtHalfBath + EnclosedPorch +
## Fence_q + Fireplaces + Foundation_q + FullBath + HalfBath +
## HeatingQC_q + KitchenAbvGr + KitchenQual_q + LotArea + LotFrontage +
## LowQualFinSF + MasVnrArea + MiscVal + MSSubClass + OpenPorchSF +
## OverallCond + PoolArea + ScreenPorch + WoodDeckSF + X3SsnPorch +
## YearRemodAdd + YrSold
##
## Df Sum of Sq RSS AIC
## - MiscVal 1 5.9413e+08 2.5942e+12 31147
## - BsmtHalfBath 1 6.6863e+08 2.5943e+12 31147
## - PoolArea 1 9.2519e+08 2.5946e+12 31147
## - X3SsnPorch 1 1.5840e+09 2.5952e+12 31148
## - EnclosedPorch 1 1.7541e+09 2.5954e+12 31148
## - BedroomAbvGr 1 1.8876e+09 2.5955e+12 31148
## <none> 2.5936e+12 31149
## - LowQualFinSF 1 4.3109e+09 2.5980e+12 31149
## - KitchenAbvGr 1 4.5488e+09 2.5982e+12 31150
## - YrSold 1 5.7366e+09 2.5994e+12 31150
## + Electrical_q 1 1.8698e+08 2.5935e+12 31151
## + MoSold 1 4.7017e+07 2.5936e+12 31151
## + BsmtFinType2_q 1 3.8670e+03 2.5936e+12 31151
## - OverallCond 1 8.3354e+09 2.6020e+12 31152
## - YearRemodAdd 1 8.6341e+09 2.6023e+12 31152
## - Fence_q 1 9.0162e+09 2.6027e+12 31152
## - HeatingQC_q 1 1.0104e+10 2.6037e+12 31153
## - OpenPorchSF 1 1.4897e+10 2.6085e+12 31155
## - ScreenPorch 1 2.2919e+10 2.6166e+12 31160
## - Foundation_q 1 3.1586e+10 2.6252e+12 31165
## - LotFrontage 1 4.0029e+10 2.6337e+12 31169
## - WoodDeckSF 1 5.1802e+10 2.6454e+12 31176
## - LotArea 1 5.2627e+10 2.6463e+12 31176
## - MSSubClass 1 6.7306e+10 2.6610e+12 31184
## - HalfBath 1 7.0434e+10 2.6641e+12 31186
## - BsmtFullBath 1 1.1610e+11 2.7097e+12 31211
## - Fireplaces 1 1.8629e+11 2.7799e+12 31248
## - FullBath 1 2.4782e+11 2.8415e+12 31280
## - MasVnrArea 1 2.8145e+11 2.8751e+12 31297
## - KitchenQual_q 1 3.7635e+11 2.9700e+12 31345
##
## Step: AIC=31147.26
## SalePrice ~ BedroomAbvGr + BsmtFullBath + BsmtHalfBath + EnclosedPorch +
## Fence_q + Fireplaces + Foundation_q + FullBath + HalfBath +
## HeatingQC_q + KitchenAbvGr + KitchenQual_q + LotArea + LotFrontage +
## LowQualFinSF + MasVnrArea + MSSubClass + OpenPorchSF + OverallCond +
## PoolArea + ScreenPorch + WoodDeckSF + X3SsnPorch + YearRemodAdd +
## YrSold
##
## Df Sum of Sq RSS AIC
## - BsmtHalfBath 1 6.4343e+08 2.5949e+12 31146
## - PoolArea 1 9.8078e+08 2.5952e+12 31146
## - X3SsnPorch 1 1.5822e+09 2.5958e+12 31146
## - EnclosedPorch 1 1.7949e+09 2.5960e+12 31146
## - BedroomAbvGr 1 1.8535e+09 2.5961e+12 31146
## <none> 2.5942e+12 31147
## - LowQualFinSF 1 4.3018e+09 2.5985e+12 31148
## - KitchenAbvGr 1 4.3334e+09 2.5986e+12 31148
## - YrSold 1 5.7461e+09 2.6000e+12 31149
## + MiscVal 1 5.9413e+08 2.5936e+12 31149
## + Electrical_q 1 1.6582e+08 2.5941e+12 31149
## + MoSold 1 4.6090e+07 2.5942e+12 31149
## + BsmtFinType2_q 1 1.6124e+05 2.5942e+12 31149
## - OverallCond 1 8.6709e+09 2.6029e+12 31150
## - YearRemodAdd 1 8.7414e+09 2.6030e+12 31150
## - Fence_q 1 8.9307e+09 2.6032e+12 31150
## - HeatingQC_q 1 1.0118e+10 2.6044e+12 31151
## - OpenPorchSF 1 1.4833e+10 2.6091e+12 31154
## - ScreenPorch 1 2.3212e+10 2.6175e+12 31158
## - Foundation_q 1 3.1933e+10 2.6262e+12 31163
## - LotFrontage 1 3.9556e+10 2.6338e+12 31167
## - WoodDeckSF 1 5.1747e+10 2.6460e+12 31174
## - LotArea 1 5.3262e+10 2.6475e+12 31175
## - MSSubClass 1 6.7865e+10 2.6621e+12 31183
## - HalfBath 1 7.0716e+10 2.6650e+12 31185
## - BsmtFullBath 1 1.1578e+11 2.7100e+12 31209
## - Fireplaces 1 1.8644e+11 2.7807e+12 31247
## - FullBath 1 2.4789e+11 2.8421e+12 31279
## - MasVnrArea 1 2.8129e+11 2.8755e+12 31296
## - KitchenQual_q 1 3.7579e+11 2.9700e+12 31343
##
## Step: AIC=31145.62
## SalePrice ~ BedroomAbvGr + BsmtFullBath + EnclosedPorch + Fence_q +
## Fireplaces + Foundation_q + FullBath + HalfBath + HeatingQC_q +
## KitchenAbvGr + KitchenQual_q + LotArea + LotFrontage + LowQualFinSF +
## MasVnrArea + MSSubClass + OpenPorchSF + OverallCond + PoolArea +
## ScreenPorch + WoodDeckSF + X3SsnPorch + YearRemodAdd + YrSold
##
## Df Sum of Sq RSS AIC
## - PoolArea 1 1.0159e+09 2.5959e+12 31144
## - X3SsnPorch 1 1.6705e+09 2.5966e+12 31145
## - EnclosedPorch 1 1.7544e+09 2.5966e+12 31145
## - BedroomAbvGr 1 1.9922e+09 2.5969e+12 31145
## <none> 2.5949e+12 31146
## - LowQualFinSF 1 4.2366e+09 2.5991e+12 31146
## - KitchenAbvGr 1 4.4287e+09 2.5993e+12 31146
## - YrSold 1 5.9091e+09 2.6008e+12 31147
## + BsmtHalfBath 1 6.4343e+08 2.5942e+12 31147
## + MiscVal 1 5.6894e+08 2.5943e+12 31147
## + Electrical_q 1 1.4997e+08 2.5947e+12 31148
## + MoSold 1 5.4750e+07 2.5948e+12 31148
## + BsmtFinType2_q 1 1.0796e+07 2.5949e+12 31148
## - YearRemodAdd 1 8.9424e+09 2.6038e+12 31149
## - Fence_q 1 8.9643e+09 2.6038e+12 31149
## - OverallCond 1 9.0629e+09 2.6039e+12 31149
## - HeatingQC_q 1 9.9983e+09 2.6049e+12 31149
## - OpenPorchSF 1 1.4786e+10 2.6097e+12 31152
## - ScreenPorch 1 2.3409e+10 2.6183e+12 31157
## - Foundation_q 1 3.1816e+10 2.6267e+12 31161
## - LotFrontage 1 3.9353e+10 2.6342e+12 31166
## - WoodDeckSF 1 5.2621e+10 2.6475e+12 31173
## - LotArea 1 5.4133e+10 2.6490e+12 31174
## - MSSubClass 1 6.7445e+10 2.6623e+12 31181
## - HalfBath 1 7.0207e+10 2.6651e+12 31183
## - BsmtFullBath 1 1.1618e+11 2.7111e+12 31208
## - Fireplaces 1 1.8736e+11 2.7822e+12 31245
## - FullBath 1 2.4761e+11 2.8425e+12 31277
## - MasVnrArea 1 2.8349e+11 2.8784e+12 31295
## - KitchenQual_q 1 3.7585e+11 2.9707e+12 31341
##
## Step: AIC=31144.19
## SalePrice ~ BedroomAbvGr + BsmtFullBath + EnclosedPorch + Fence_q +
## Fireplaces + Foundation_q + FullBath + HalfBath + HeatingQC_q +
## KitchenAbvGr + KitchenQual_q + LotArea + LotFrontage + LowQualFinSF +
## MasVnrArea + MSSubClass + OpenPorchSF + OverallCond + ScreenPorch +
## WoodDeckSF + X3SsnPorch + YearRemodAdd + YrSold
##
## Df Sum of Sq RSS AIC
## - X3SsnPorch 1 1.6690e+09 2.5976e+12 31143
## - EnclosedPorch 1 1.9091e+09 2.5978e+12 31143
## - BedroomAbvGr 1 2.0693e+09 2.5980e+12 31143
## <none> 2.5959e+12 31144
## - LowQualFinSF 1 4.4151e+09 2.6003e+12 31145
## - KitchenAbvGr 1 4.5340e+09 2.6004e+12 31145
## + PoolArea 1 1.0159e+09 2.5949e+12 31146
## - YrSold 1 6.2523e+09 2.6022e+12 31146
## + BsmtHalfBath 1 6.7857e+08 2.5952e+12 31146
## + MiscVal 1 6.2370e+08 2.5953e+12 31146
## + Electrical_q 1 1.5454e+08 2.5957e+12 31146
## + MoSold 1 3.2209e+07 2.5959e+12 31146
## + BsmtFinType2_q 1 6.0394e+06 2.5959e+12 31146
## - Fence_q 1 8.2796e+09 2.6042e+12 31147
## - YearRemodAdd 1 8.9360e+09 2.6048e+12 31147
## - OverallCond 1 8.9848e+09 2.6049e+12 31147
## - HeatingQC_q 1 9.5788e+09 2.6055e+12 31148
## - OpenPorchSF 1 1.5029e+10 2.6109e+12 31151
## - ScreenPorch 1 2.3823e+10 2.6197e+12 31156
## - Foundation_q 1 3.1964e+10 2.6279e+12 31160
## - LotFrontage 1 4.1127e+10 2.6370e+12 31165
## - WoodDeckSF 1 5.3365e+10 2.6493e+12 31172
## - LotArea 1 5.4778e+10 2.6507e+12 31173
## - MSSubClass 1 6.6787e+10 2.6627e+12 31179
## - HalfBath 1 7.0110e+10 2.6660e+12 31181
## - BsmtFullBath 1 1.1744e+11 2.7133e+12 31207
## - Fireplaces 1 1.8887e+11 2.7848e+12 31245
## - FullBath 1 2.4831e+11 2.8442e+12 31276
## - MasVnrArea 1 2.8274e+11 2.8786e+12 31293
## - KitchenQual_q 1 3.7885e+11 2.9747e+12 31341
##
## Step: AIC=31143.13
## SalePrice ~ BedroomAbvGr + BsmtFullBath + EnclosedPorch + Fence_q +
## Fireplaces + Foundation_q + FullBath + HalfBath + HeatingQC_q +
## KitchenAbvGr + KitchenQual_q + LotArea + LotFrontage + LowQualFinSF +
## MasVnrArea + MSSubClass + OpenPorchSF + OverallCond + ScreenPorch +
## WoodDeckSF + YearRemodAdd + YrSold
##
## Df Sum of Sq RSS AIC
## - EnclosedPorch 1 1.7937e+09 2.5994e+12 31142
## - BedroomAbvGr 1 1.9290e+09 2.5995e+12 31142
## <none> 2.5976e+12 31143
## - LowQualFinSF 1 4.4558e+09 2.6020e+12 31144
## - KitchenAbvGr 1 4.5787e+09 2.6021e+12 31144
## + X3SsnPorch 1 1.6690e+09 2.5959e+12 31144
## + PoolArea 1 1.0144e+09 2.5966e+12 31145
## - YrSold 1 6.1546e+09 2.6037e+12 31145
## + BsmtHalfBath 1 7.6899e+08 2.5968e+12 31145
## + MiscVal 1 6.2007e+08 2.5969e+12 31145
## + Electrical_q 1 1.5842e+08 2.5974e+12 31145
## + MoSold 1 5.0677e+07 2.5975e+12 31145
## + BsmtFinType2_q 1 5.8051e+06 2.5976e+12 31145
## - Fence_q 1 8.1271e+09 2.6057e+12 31146
## - YearRemodAdd 1 8.9260e+09 2.6065e+12 31146
## - OverallCond 1 9.3505e+09 2.6069e+12 31146
## - HeatingQC_q 1 9.8788e+09 2.6074e+12 31147
## - OpenPorchSF 1 1.4801e+10 2.6124e+12 31149
## - ScreenPorch 1 2.3379e+10 2.6209e+12 31154
## - Foundation_q 1 3.2527e+10 2.6301e+12 31159
## - LotFrontage 1 4.1319e+10 2.6389e+12 31164
## - WoodDeckSF 1 5.2433e+10 2.6500e+12 31170
## - LotArea 1 5.5232e+10 2.6528e+12 31172
## - MSSubClass 1 6.7759e+10 2.6653e+12 31179
## - HalfBath 1 7.0324e+10 2.6679e+12 31180
## - BsmtFullBath 1 1.1751e+11 2.7151e+12 31206
## - Fireplaces 1 1.8924e+11 2.7868e+12 31244
## - FullBath 1 2.5042e+11 2.8480e+12 31276
## - MasVnrArea 1 2.8345e+11 2.8810e+12 31292
## - KitchenQual_q 1 3.7784e+11 2.9754e+12 31339
##
## Step: AIC=31142.14
## SalePrice ~ BedroomAbvGr + BsmtFullBath + Fence_q + Fireplaces +
## Foundation_q + FullBath + HalfBath + HeatingQC_q + KitchenAbvGr +
## KitchenQual_q + LotArea + LotFrontage + LowQualFinSF + MasVnrArea +
## MSSubClass + OpenPorchSF + OverallCond + ScreenPorch + WoodDeckSF +
## YearRemodAdd + YrSold
##
## Df Sum of Sq RSS AIC
## - BedroomAbvGr 1 2.1315e+09 2.6015e+12 31141
## <none> 2.5994e+12 31142
## - KitchenAbvGr 1 4.5631e+09 2.6039e+12 31143
## - LowQualFinSF 1 4.6323e+09 2.6040e+12 31143
## + EnclosedPorch 1 1.7937e+09 2.5976e+12 31143
## + X3SsnPorch 1 1.5535e+09 2.5978e+12 31143
## + PoolArea 1 1.1642e+09 2.5982e+12 31144
## - YrSold 1 6.1589e+09 2.6055e+12 31144
## + BsmtHalfBath 1 7.2368e+08 2.5986e+12 31144
## + MiscVal 1 6.6696e+08 2.5987e+12 31144
## + Electrical_q 1 2.5250e+08 2.5991e+12 31144
## + MoSold 1 3.4117e+07 2.5993e+12 31144
## + BsmtFinType2_q 1 1.9561e+07 2.5993e+12 31144
## - Fence_q 1 7.9341e+09 2.6073e+12 31145
## - YearRemodAdd 1 8.1863e+09 2.6075e+12 31145
## - OverallCond 1 9.6087e+09 2.6090e+12 31146
## - HeatingQC_q 1 1.0043e+10 2.6094e+12 31146
## - OpenPorchSF 1 1.4414e+10 2.6138e+12 31148
## - ScreenPorch 1 2.2229e+10 2.6216e+12 31153
## - Foundation_q 1 3.1218e+10 2.6306e+12 31158
## - LotFrontage 1 4.2054e+10 2.6414e+12 31164
## - WoodDeckSF 1 5.1107e+10 2.6505e+12 31169
## - LotArea 1 5.5078e+10 2.6544e+12 31171
## - MSSubClass 1 6.7268e+10 2.6666e+12 31177
## - HalfBath 1 6.9432e+10 2.6688e+12 31179
## - BsmtFullBath 1 1.1720e+11 2.7166e+12 31205
## - Fireplaces 1 1.9137e+11 2.7907e+12 31244
## - FullBath 1 2.4973e+11 2.8491e+12 31274
## - MasVnrArea 1 2.8181e+11 2.8812e+12 31290
## - KitchenQual_q 1 3.8328e+11 2.9826e+12 31341
##
## Step: AIC=31141.34
## SalePrice ~ BsmtFullBath + Fence_q + Fireplaces + Foundation_q +
## FullBath + HalfBath + HeatingQC_q + KitchenAbvGr + KitchenQual_q +
## LotArea + LotFrontage + LowQualFinSF + MasVnrArea + MSSubClass +
## OpenPorchSF + OverallCond + ScreenPorch + WoodDeckSF + YearRemodAdd +
## YrSold
##
## Df Sum of Sq RSS AIC
## <none> 2.6015e+12 31141
## - KitchenAbvGr 1 3.6961e+09 2.6052e+12 31141
## + BedroomAbvGr 1 2.1315e+09 2.5994e+12 31142
## + EnclosedPorch 1 1.9962e+09 2.5995e+12 31142
## - LowQualFinSF 1 5.3295e+09 2.6068e+12 31142
## + X3SsnPorch 1 1.4055e+09 2.6001e+12 31143
## + PoolArea 1 1.2570e+09 2.6002e+12 31143
## + BsmtHalfBath 1 8.7035e+08 2.6006e+12 31143
## - YrSold 1 6.3919e+09 2.6079e+12 31143
## + MiscVal 1 6.3019e+08 2.6009e+12 31143
## + Electrical_q 1 1.6847e+08 2.6013e+12 31143
## + MoSold 1 4.9617e+07 2.6014e+12 31143
## + BsmtFinType2_q 1 2.3554e+07 2.6015e+12 31143
## - YearRemodAdd 1 7.2376e+09 2.6087e+12 31143
## - Fence_q 1 7.2819e+09 2.6088e+12 31143
## - HeatingQC_q 1 9.8954e+09 2.6114e+12 31145
## - OverallCond 1 1.0868e+10 2.6124e+12 31145
## - OpenPorchSF 1 1.4411e+10 2.6159e+12 31147
## - ScreenPorch 1 2.2514e+10 2.6240e+12 31152
## - Foundation_q 1 3.0523e+10 2.6320e+12 31156
## - LotFrontage 1 4.4215e+10 2.6457e+12 31164
## - WoodDeckSF 1 5.1393e+10 2.6529e+12 31168
## - LotArea 1 5.6539e+10 2.6580e+12 31171
## - MSSubClass 1 7.2283e+10 2.6738e+12 31179
## - HalfBath 1 8.2888e+10 2.6844e+12 31185
## - BsmtFullBath 1 1.1551e+11 2.7170e+12 31203
## - Fireplaces 1 1.9096e+11 2.7925e+12 31243
## - MasVnrArea 1 2.8291e+11 2.8844e+12 31290
## - FullBath 1 3.2009e+11 2.9216e+12 31309
## - KitchenQual_q 1 3.8128e+11 2.9828e+12 31339
##
## Call:
## lm(formula = SalePrice ~ BsmtFullBath + Fence_q + Fireplaces +
## Foundation_q + FullBath + HalfBath + HeatingQC_q + KitchenAbvGr +
## KitchenQual_q + LotArea + LotFrontage + LowQualFinSF + MasVnrArea +
## MSSubClass + OpenPorchSF + OverallCond + ScreenPorch + WoodDeckSF +
## YearRemodAdd + YrSold, data = p2_train)
##
## Coefficients:
## (Intercept) BsmtFullBath Fence_q Fireplaces Foundation_q
## 2.845e+06 1.850e+04 -1.954e+03 2.041e+04 -7.830e+03
## FullBath HalfBath HeatingQC_q KitchenAbvGr KitchenQual_q
## 3.577e+04 1.632e+04 3.442e+03 -8.099e+03 3.467e+04
## LotArea LotFrontage LowQualFinSF MasVnrArea MSSubClass
## 6.739e-01 1.679e+02 3.988e+01 8.451e+01 -1.869e+02
## OpenPorchSF OverallCond ScreenPorch WoodDeckSF YearRemodAdd
## 5.106e+01 2.748e+03 7.291e+01 5.129e+01 1.611e+02
## YrSold
## -1.592e+03
Final Model
This model is the result of 5 iterations of the original model followed by a stepAIC computation producing the Final Model as follows :
## The following model computed the lowest score of AIC=31141.34
lm_final.lm <- lm((SalePrice ~ BsmtFullBath + Fence_q + Fireplaces + Foundation_q + FullBath + HalfBath + HeatingQC_q + KitchenAbvGr + KitchenQual_q + LotArea + LotFrontage + LowQualFinSF + MasVnrArea + MSSubClass + OpenPorchSF + OverallCond + ScreenPorch + WoodDeckSF + YearRemodAdd + YrSold), data = p2_train)
summary(lm_final.lm)##
## Call:
## lm(formula = (SalePrice ~ BsmtFullBath + Fence_q + Fireplaces +
## Foundation_q + FullBath + HalfBath + HeatingQC_q + KitchenAbvGr +
## KitchenQual_q + LotArea + LotFrontage + LowQualFinSF + MasVnrArea +
## MSSubClass + OpenPorchSF + OverallCond + ScreenPorch + WoodDeckSF +
## YearRemodAdd + YrSold), data = p2_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -341177 -24710 -3586 19167 391779
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.845e+06 1.700e+06 1.674 0.09434 .
## BsmtFullBath 1.850e+04 2.314e+03 7.993 2.67e-15 ***
## Fence_q -1.954e+03 9.734e+02 -2.007 0.04494 *
## Fireplaces 2.041e+04 1.986e+03 10.278 < 2e-16 ***
## Foundation_q -7.830e+03 1.906e+03 -4.109 4.20e-05 ***
## FullBath 3.577e+04 2.688e+03 13.306 < 2e-16 ***
## HalfBath 1.632e+04 2.411e+03 6.771 1.86e-11 ***
## HeatingQC_q 3.442e+03 1.471e+03 2.340 0.01944 *
## KitchenAbvGr -8.099e+03 5.664e+03 -1.430 0.15298
## KitchenQual_q 3.467e+04 2.388e+03 14.523 < 2e-16 ***
## LotArea 6.739e-01 1.205e-01 5.592 2.68e-08 ***
## LotFrontage 1.679e+02 3.396e+01 4.945 8.49e-07 ***
## LowQualFinSF 3.988e+01 2.323e+01 1.717 0.08620 .
## MasVnrArea 8.451e+01 6.755e+00 12.510 < 2e-16 ***
## MSSubClass -1.869e+02 2.956e+01 -6.323 3.41e-10 ***
## OpenPorchSF 5.106e+01 1.808e+01 2.823 0.00482 **
## OverallCond 2.748e+03 1.121e+03 2.452 0.01433 *
## ScreenPorch 7.291e+01 2.066e+01 3.529 0.00043 ***
## WoodDeckSF 5.129e+01 9.620e+00 5.332 1.13e-07 ***
## YearRemodAdd 1.611e+02 8.051e+01 2.001 0.04560 *
## YrSold -1.592e+03 8.464e+02 -1.880 0.06027 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 42520 on 1439 degrees of freedom
## Multiple R-squared: 0.7175, Adjusted R-squared: 0.7135
## F-statistic: 182.7 on 20 and 1439 DF, p-value: < 2.2e-16
Residuals Discussion
The histogram of the residuals shows an almost perfect normal distribution with mean approximately = 0
In the QQ lot, data that aligns closely to the red line indicates a normal distribution. If the points skew drastically from the line, you could consider adjusting your model by adding or removing other variables in the regression model, this model is the result of that model adjustment.
hist(lm_final.lm$residuals, prob = TRUE)
abline(v = mean(lm_final.lm$residuals), # Add line for mean
col = "red",
lwd = 3)
lines(density(lm_final.lm$residuals),col = "blue")
qqnorm(lm_final.lm$residuals)
qqline(lm_final.lm$residuals, col = "red")
The fitted and residual values seem to have a linear relationship, there is some evidence of heteroskedastic behavior
plot(lm_final.lm$fitted.values, lm_final.lm$residuals,
xlab="Fitted Values", ylab="Residuals",
main="Residuals Plot",col = "blue")
abline(h=0)
Predicting the Test Data
p2_test %>% select(order(colnames(p2_test)))str(p2_test)## 'data.frame': 1459 obs. of 88 variables:
## $ Id : int 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 ...
## $ MSSubClass : int 20 20 60 60 120 60 20 60 20 20 ...
## $ MSZoning : chr "RH" "RL" "RL" "RL" ...
## $ LotFrontage : num 80 81 74 78 43 75 0 63 85 70 ...
## $ LotArea : int 11622 14267 13830 9978 5005 10000 7980 8402 10176 8400 ...
## $ Street : chr "Pave" "Pave" "Pave" "Pave" ...
## $ Alley : chr "0" "0" "0" "0" ...
## $ LotShape : chr "Reg" "IR1" "IR1" "IR1" ...
## $ LandContour : chr "Lvl" "Lvl" "Lvl" "Lvl" ...
## $ Utilities : chr "AllPub" "AllPub" "AllPub" "AllPub" ...
## $ LotConfig : chr "Inside" "Corner" "Inside" "Inside" ...
## $ LandSlope : chr "Gtl" "Gtl" "Gtl" "Gtl" ...
## $ Neighborhood : chr "NAmes" "NAmes" "Gilbert" "Gilbert" ...
## $ Condition1 : chr "Feedr" "Norm" "Norm" "Norm" ...
## $ Condition2 : chr "Norm" "Norm" "Norm" "Norm" ...
## $ BldgType : chr "1Fam" "1Fam" "1Fam" "1Fam" ...
## $ HouseStyle : chr "1Story" "1Story" "2Story" "2Story" ...
## $ OverallQual : int 5 6 5 6 8 6 6 6 7 4 ...
## $ OverallCond : int 6 6 5 6 5 5 7 5 5 5 ...
## $ YearBuilt : int 1961 1958 1997 1998 1992 1993 1992 1998 1990 1970 ...
## $ YearRemodAdd : int 1961 1958 1998 1998 1992 1994 2007 1998 1990 1970 ...
## $ RoofStyle : chr "Gable" "Hip" "Gable" "Gable" ...
## $ RoofMatl : chr "CompShg" "CompShg" "CompShg" "CompShg" ...
## $ Exterior1st : chr "VinylSd" "Wd Sdng" "VinylSd" "VinylSd" ...
## $ Exterior2nd : chr "VinylSd" "Wd Sdng" "VinylSd" "VinylSd" ...
## $ MasVnrType : chr "None" "BrkFace" "None" "BrkFace" ...
## $ MasVnrArea : num 0 108 0 20 0 0 0 0 0 0 ...
## $ ExterQual : chr "TA" "TA" "TA" "TA" ...
## $ ExterCond : chr "TA" "TA" "TA" "TA" ...
## $ Foundation : chr "CBlock" "CBlock" "PConc" "PConc" ...
## $ BsmtQual : chr "TA" "TA" "Gd" "TA" ...
## $ BsmtCond : chr "TA" "TA" "TA" "TA" ...
## $ BsmtExposure : chr "No" "No" "No" "No" ...
## $ BsmtFinType1 : chr "Rec" "ALQ" "GLQ" "GLQ" ...
## $ BsmtFinSF1 : num 468 923 791 602 263 0 935 0 637 804 ...
## $ BsmtFinType2 : chr "LwQ" "Unf" "Unf" "Unf" ...
## $ BsmtFinSF2 : num 144 0 0 0 0 0 0 0 0 78 ...
## $ BsmtUnfSF : num 270 406 137 324 1017 ...
## $ TotalBsmtSF : num 882 1329 928 926 1280 ...
## $ Heating : chr "GasA" "GasA" "GasA" "GasA" ...
## $ HeatingQC : chr "TA" "TA" "Gd" "Ex" ...
## $ CentralAir : chr "Y" "Y" "Y" "Y" ...
## $ Electrical : chr "SBrkr" "SBrkr" "SBrkr" "SBrkr" ...
## $ X1stFlrSF : int 896 1329 928 926 1280 763 1187 789 1341 882 ...
## $ X2ndFlrSF : int 0 0 701 678 0 892 0 676 0 0 ...
## $ LowQualFinSF : int 0 0 0 0 0 0 0 0 0 0 ...
## $ GrLivArea : int 896 1329 1629 1604 1280 1655 1187 1465 1341 882 ...
## $ BsmtFullBath : num 0 0 0 0 0 0 1 0 1 1 ...
## $ BsmtHalfBath : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FullBath : int 1 1 2 2 2 2 2 2 1 1 ...
## $ HalfBath : int 0 1 1 1 0 1 0 1 1 0 ...
## $ BedroomAbvGr : int 2 3 3 3 2 3 3 3 2 2 ...
## $ KitchenAbvGr : int 1 1 1 1 1 1 1 1 1 1 ...
## $ KitchenQual : chr "TA" "Gd" "TA" "Gd" ...
## $ TotRmsAbvGrd : int 5 6 6 7 5 7 6 7 5 4 ...
## $ Functional : chr "Typ" "Typ" "Typ" "Typ" ...
## $ Fireplaces : int 0 0 1 1 0 1 0 1 1 0 ...
## $ FireplaceQu : chr "0" "0" "TA" "Gd" ...
## $ GarageType : chr "Attchd" "Attchd" "Attchd" "Attchd" ...
## $ GarageYrBlt : num 1961 1958 1997 1998 1992 ...
## $ GarageFinish : chr "Unf" "Unf" "Fin" "Fin" ...
## $ GarageCars : num 1 1 2 2 2 2 2 2 2 2 ...
## $ GarageArea : num 730 312 482 470 506 440 420 393 506 525 ...
## $ GarageQual : chr "TA" "TA" "TA" "TA" ...
## $ GarageCond : chr "TA" "TA" "TA" "TA" ...
## $ PavedDrive : chr "Y" "Y" "Y" "Y" ...
## $ WoodDeckSF : int 140 393 212 360 0 157 483 0 192 240 ...
## $ OpenPorchSF : int 0 36 34 36 82 84 21 75 0 0 ...
## $ EnclosedPorch : int 0 0 0 0 0 0 0 0 0 0 ...
## $ X3SsnPorch : int 0 0 0 0 0 0 0 0 0 0 ...
## $ ScreenPorch : int 120 0 0 0 144 0 0 0 0 0 ...
## $ PoolArea : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PoolQC : chr "0" "0" "0" "0" ...
## $ Fence : chr "MnPrv" "0" "MnPrv" "0" ...
## $ MiscFeature : chr "0" "Gar2" "0" "0" ...
## $ MiscVal : int 0 12500 0 0 0 0 500 0 0 0 ...
## $ MoSold : int 6 6 3 6 1 4 3 5 2 4 ...
## $ YrSold : int 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
## $ SaleType : chr "WD" "WD" "WD" "WD" ...
## $ SaleCondition : chr "Normal" "Normal" "Normal" "Normal" ...
## $ SalePrice : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Foundation_q : num 5 5 4 4 4 4 4 4 4 5 ...
## $ BsmtFinType2_q: num 2 1 1 1 1 1 1 1 1 3 ...
## $ HeatingQC_q : num 3 3 4 5 5 4 5 4 4 3 ...
## $ Electrical_q : num 5 5 5 5 5 5 5 5 5 5 ...
## $ KitchenQual_q : num 3 4 3 4 4 3 3 3 4 3 ...
## $ GarageCond_q : num 3 3 3 3 3 3 3 3 3 3 ...
## $ Fence_q : num 3 0 3 0 0 0 4 0 0 3 ...
kable(head(p2_test))| Id | MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | OverallQual | OverallCond | YearBuilt | YearRemodAdd | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | MasVnrArea | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinSF1 | BsmtFinType2 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | Heating | HeatingQC | CentralAir | Electrical | X1stFlrSF | X2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath | FullBath | HalfBath | BedroomAbvGr | KitchenAbvGr | KitchenQual | TotRmsAbvGrd | Functional | Fireplaces | FireplaceQu | GarageType | GarageYrBlt | GarageFinish | GarageCars | GarageArea | GarageQual | GarageCond | PavedDrive | WoodDeckSF | OpenPorchSF | EnclosedPorch | X3SsnPorch | ScreenPorch | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | SalePrice | Foundation_q | BsmtFinType2_q | HeatingQC_q | Electrical_q | KitchenQual_q | GarageCond_q | Fence_q |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1461 | 20 | RH | 80 | 11622 | Pave | 0 | Reg | Lvl | AllPub | Inside | Gtl | NAmes | Feedr | Norm | 1Fam | 1Story | 5 | 6 | 1961 | 1961 | Gable | CompShg | VinylSd | VinylSd | None | 0 | TA | TA | CBlock | TA | TA | No | Rec | 468 | LwQ | 144 | 270 | 882 | GasA | TA | Y | SBrkr | 896 | 0 | 0 | 896 | 0 | 0 | 1 | 0 | 2 | 1 | TA | 5 | Typ | 0 | 0 | Attchd | 1961 | Unf | 1 | 730 | TA | TA | Y | 140 | 0 | 0 | 0 | 120 | 0 | 0 | MnPrv | 0 | 0 | 6 | 2010 | WD | Normal | 0 | 5 | 2 | 3 | 5 | 3 | 3 | 3 |
| 1462 | 20 | RL | 81 | 14267 | Pave | 0 | IR1 | Lvl | AllPub | Corner | Gtl | NAmes | Norm | Norm | 1Fam | 1Story | 6 | 6 | 1958 | 1958 | Hip | CompShg | Wd Sdng | Wd Sdng | BrkFace | 108 | TA | TA | CBlock | TA | TA | No | ALQ | 923 | Unf | 0 | 406 | 1329 | GasA | TA | Y | SBrkr | 1329 | 0 | 0 | 1329 | 0 | 0 | 1 | 1 | 3 | 1 | Gd | 6 | Typ | 0 | 0 | Attchd | 1958 | Unf | 1 | 312 | TA | TA | Y | 393 | 36 | 0 | 0 | 0 | 0 | 0 | 0 | Gar2 | 12500 | 6 | 2010 | WD | Normal | 0 | 5 | 1 | 3 | 5 | 4 | 3 | 0 |
| 1463 | 60 | RL | 74 | 13830 | Pave | 0 | IR1 | Lvl | AllPub | Inside | Gtl | Gilbert | Norm | Norm | 1Fam | 2Story | 5 | 5 | 1997 | 1998 | Gable | CompShg | VinylSd | VinylSd | None | 0 | TA | TA | PConc | Gd | TA | No | GLQ | 791 | Unf | 0 | 137 | 928 | GasA | Gd | Y | SBrkr | 928 | 701 | 0 | 1629 | 0 | 0 | 2 | 1 | 3 | 1 | TA | 6 | Typ | 1 | TA | Attchd | 1997 | Fin | 2 | 482 | TA | TA | Y | 212 | 34 | 0 | 0 | 0 | 0 | 0 | MnPrv | 0 | 0 | 3 | 2010 | WD | Normal | 0 | 4 | 1 | 4 | 5 | 3 | 3 | 3 |
| 1464 | 60 | RL | 78 | 9978 | Pave | 0 | IR1 | Lvl | AllPub | Inside | Gtl | Gilbert | Norm | Norm | 1Fam | 2Story | 6 | 6 | 1998 | 1998 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 20 | TA | TA | PConc | TA | TA | No | GLQ | 602 | Unf | 0 | 324 | 926 | GasA | Ex | Y | SBrkr | 926 | 678 | 0 | 1604 | 0 | 0 | 2 | 1 | 3 | 1 | Gd | 7 | Typ | 1 | Gd | Attchd | 1998 | Fin | 2 | 470 | TA | TA | Y | 360 | 36 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 2010 | WD | Normal | 0 | 4 | 1 | 5 | 5 | 4 | 3 | 0 |
| 1465 | 120 | RL | 43 | 5005 | Pave | 0 | IR1 | HLS | AllPub | Inside | Gtl | StoneBr | Norm | Norm | TwnhsE | 1Story | 8 | 5 | 1992 | 1992 | Gable | CompShg | HdBoard | HdBoard | None | 0 | Gd | TA | PConc | Gd | TA | No | ALQ | 263 | Unf | 0 | 1017 | 1280 | GasA | Ex | Y | SBrkr | 1280 | 0 | 0 | 1280 | 0 | 0 | 2 | 0 | 2 | 1 | Gd | 5 | Typ | 0 | 0 | Attchd | 1992 | RFn | 2 | 506 | TA | TA | Y | 0 | 82 | 0 | 0 | 144 | 0 | 0 | 0 | 0 | 0 | 1 | 2010 | WD | Normal | 0 | 4 | 1 | 5 | 5 | 4 | 3 | 0 |
| 1466 | 60 | RL | 75 | 10000 | Pave | 0 | IR1 | Lvl | AllPub | Corner | Gtl | Gilbert | Norm | Norm | 1Fam | 2Story | 6 | 5 | 1993 | 1994 | Gable | CompShg | HdBoard | HdBoard | None | 0 | TA | TA | PConc | Gd | TA | No | Unf | 0 | Unf | 0 | 763 | 763 | GasA | Gd | Y | SBrkr | 763 | 892 | 0 | 1655 | 0 | 0 | 2 | 1 | 3 | 1 | TA | 7 | Typ | 1 | TA | Attchd | 1993 | Fin | 2 | 440 | TA | TA | Y | 157 | 84 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 2010 | WD | Normal | 0 | 4 | 1 | 4 | 5 | 3 | 3 | 0 |
# Predict prices for test data
#house_test <- read.csv('/Users/letiix3/Desktop/Data-605/Week-15/House_Price/test.csv')
p2_test_final <- p2_test %>%
dplyr::select_if(is.numeric) %>%
replace(is.na(.),0)
prediction <- predict(lm_final.lm, p2_test_final, type = "response")
head(prediction)## 1 2 3 4 5 6
## 108822.2 182340.8 185503.7 239686.1 171483.0 188038.8
# Preparing data frame for submission
kag_pred <- data.frame(Id = p2_test_final$Id, SalePrice = prediction)
head(kag_pred)dim(kag_pred)## [1] 1459 2
# commenting out to not create new file
#write.csv(kag_pred, file = "tns_submission_prediction.csv", row.names=FALSE)
#-Kaggle Confirmation
References
https://mathworld.wolfram.com/ExponentialSumFormulas.html
https://pubs.wsb.wisc.edu/academics/analytics-using-r-2019/gamma-variables-optional.html
https://www.programmingr.com/examples/neat-tricks/sample-r-function/rexp/
https://bookdown.org/rdpeng/rprogdatascience/simulation.html
https://math.stackexchange.com/questions/2189317/mean-of-gamma-distribution
https://www.youtube.com/watch?v=cI-WFRqXbKM
https://www.pnw.edu/wp-content/uploads/2020/03/Lecture-Notes-7.pdf
https://www.tutorialspoint.com/set-values-in-categorical-column-to-numeric-values-in-r-data-frame